#
RoSA '02 (Rouen, June 6-7, 2002)

#
Generalized Pattern Matching Statistics: Dynamical sources context

In pattern matching algorithms, a characteristic parameter is the
number of occurrences of a given pattern in a random text of length
*n* generated by a source.
We consider here a generalization of the pattern matching problem in
two ways.
First, we deal with a generalized notion of pattern that
encompasses classical patterns as well as "hidden patterns".
Second, we consider a quite general probabilistic model of sources that
may possess a high degree of correlations.
Such sources are built with dynamical systems and are called
dynamical sources.
We determine the mean and the variance of the number of occurrences in
this generalized pattern matching problem, and establish a property of
concentration of distribution.
These results are obtained via combinatorics, formal language
techniques, and methods of analytic combinatorics based on generating
operators and generating functions.
The generating operators come from the dynamical system framework and
generate themselves generating functions.
The motivation to study this problem comes from an attempt at finding a
reliable threshold for intrusion detections, from textual data
processing applications, and from molecular biology.

Retour à la page d'accueil

Back to home page