Uniqueness theorems on words and sequence assembly

Arturo Carpi,Aldo de Luca, Stefano Varricchio

To appear at Formal Power Series and Algebraic Combinatorics (FPSAC01), Tempe, Arizona (USA), May 20-26, 2001


Abstract

A factor u of a word w is (right) univalent if there exists a unique letter a such that ua is still a factor of w. A univalent factor is minimal if none of its proper suffixes is univalent. The starting block of a non-empty word w is the shortest univalent prefix of w such that all longer proper prefixes of w are univalent. We study univalent factors of a word and their relationship with the well known notions of boxes, superboxes, and minimal forbidden factors. Moreover, we prove some new uniqueness conditions for words based on univalent factors. In particular, we show that a word is uniquely determined by its starting block, the set of the extensions of its minimal univalent factors, and its length or its terminal box. Finally, we show how the results and techniques presented can be used to solve the problem of sequence assembly for DNA molecules, under reasonable assumptions on the repetitive structure of the considered molecule and on the set of known fragments. Key words: Univalent factors, boxes, sequence assembly.


Server START Conference Manager
Update Time 23 Feb 2001 at 08:48:04
Maintainer maylis@labri.u-bordeaux.fr.
Start Conference Manager
Conference Systems