#
RoSA '02 (Rouen, June 6-7, 2002)

#
Combinatoire des ensembles de périodes de mots

Combinatorics of sets of periods in strings

##
Éric RIVALS

rivals@lirmm.fr

http://www.lirmm.fr/~rivals

LIRMM, 161 rue Ada, 34392 Montpellier Cedex 5 FRANCE

Joint work with S. Rahmann.

The (auto)correlation of a finite string *U* of length *n* is
a binary representation of its set of periods.
It was introduced and characterized by Guibas and Odlyzko.
*i* is a period of *U* if the prefix and suffix of length
*n-i* of *U* are equal.
An autocorrelation is an *n*-bit vector indexed from 0 to
*n*-1 whose *i*th-bit is 1 iff *i* is a period of
*U*.
Among the 2^{n} bit vectors, only a small subset are
valid autocorrelations and many strings share the same one.
In this work, we define a smaller set of periods that determines the
set of all periods of a string.
We investigate the structure of *GAMMA*(*n*), the set of all
autocorrelations of length *n*, prove that it is a lattice under
inclusion and that this lattice does not satisfy the Jordan-Dedekind
condition.
We design the first enumeration algorithm for *GAMMA*(*n*),
which allows its computation until *n*=450.
We exhibit a relation between *kappa*(*n*), the cardinality
of *GAMMA*(*n*), and the number of binary partitions of an
integer.
From this, we derive precise and asymptotic lower bounds for
*kappa*(*n*) which improve that of Guibas and Odlyzko.
Finally, we demonstrate a recurrence to compute the population size of
one, or of all autocorrelations, i.e., the number of strings having
that autocorrelation.
These problems arise when studying the number of missing words in a
text or the number of common words between two random texts.

Retour à la page d'accueil

Back to home page