RoSA '02 (Rouen, June 6-7, 2002)

Combinatoire des ensembles de périodes de mots
Combinatorics of sets of periods in strings

LIRMM, 161 rue Ada, 34392 Montpellier Cedex 5 FRANCE
Joint work with S. Rahmann.

The (auto)correlation of a finite string U of length n is a binary representation of its set of periods. It was introduced and characterized by Guibas and Odlyzko. i is a period of U if the prefix and suffix of length n-i of U are equal. An autocorrelation is an n-bit vector indexed from 0 to n-1 whose ith-bit is 1 iff i is a period of U. Among the 2n bit vectors, only a small subset are valid autocorrelations and many strings share the same one. In this work, we define a smaller set of periods that determines the set of all periods of a string. We investigate the structure of GAMMA(n), the set of all autocorrelations of length n, prove that it is a lattice under inclusion and that this lattice does not satisfy the Jordan-Dedekind condition. We design the first enumeration algorithm for GAMMA(n), which allows its computation until n=450. We exhibit a relation between kappa(n), the cardinality of GAMMA(n), and the number of binary partitions of an integer. From this, we derive precise and asymptotic lower bounds for kappa(n) which improve that of Guibas and Odlyzko. Finally, we demonstrate a recurrence to compute the population size of one, or of all autocorrelations, i.e., the number of strings having that autocorrelation. These problems arise when studying the number of missing words in a text or the number of common words between two random texts.

Retour à la page d'accueil
Back to home page