RoSA '02 (Rouen, June 6-7, 2002)
Combinatoire des ensembles de périodes de mots
Combinatorics of sets of periods in strings
Éric RIVALS
rivals@lirmm.fr
http://www.lirmm.fr/~rivals
LIRMM, 161 rue Ada, 34392 Montpellier Cedex 5 FRANCE
Joint work with S. Rahmann.
The (auto)correlation of a finite string U of length n is
a binary representation of its set of periods.
It was introduced and characterized by Guibas and Odlyzko.
i is a period of U if the prefix and suffix of length
n-i of U are equal.
An autocorrelation is an n-bit vector indexed from 0 to
n-1 whose ith-bit is 1 iff i is a period of
U.
Among the 2n bit vectors, only a small subset are
valid autocorrelations and many strings share the same one.
In this work, we define a smaller set of periods that determines the
set of all periods of a string.
We investigate the structure of GAMMA(n), the set of all
autocorrelations of length n, prove that it is a lattice under
inclusion and that this lattice does not satisfy the Jordan-Dedekind
condition.
We design the first enumeration algorithm for GAMMA(n),
which allows its computation until n=450.
We exhibit a relation between kappa(n), the cardinality
of GAMMA(n), and the number of binary partitions of an
integer.
From this, we derive precise and asymptotic lower bounds for
kappa(n) which improve that of Guibas and Odlyzko.
Finally, we demonstrate a recurrence to compute the population size of
one, or of all autocorrelations, i.e., the number of strings having
that autocorrelation.
These problems arise when studying the number of missing words in a
text or the number of common words between two random texts.
Retour à la page d'accueil
Back to home page