Direct construction of DAWG

Combinatorial Pattern Matching
Maxime.Crochemore@univ-mlv.fr
Université de Marne-la-Vallée, 1997
M. Crochemore and R. Vérin, Direct construction of Compact Directed Acyclic Word Graphs, in (CPM97, A. Apostolico and J. Hein, eds., LNCS 1264, Springer-Verlag, 1997) pp 116--129.
In dvi, or ps format, 27 pages.

Abstract

The Directed Acyclic Word Graph (DAWG) is an efficient data structure used to treat and analyze repetitions in texts, especially in DNA genomic sequences. Here, we consider the Compact Directed Acyclic Word Graph of a word. We give the first direct algorithm to construct it. It runs in time linear in the length of the string on a fixed alphabet. Our implementation requires half the memory space used by ordinary DAWGs.
Keywords: pattern matching algorithm, suffix automaton, DAWG, compact DAWG, suffix tree, index on text.

Contents

  • Definitions
  • Compact Directed Acyclic Word Graphs (size bounds)
  • Constructing CDAWG from DAWG
  • Direct Construction of CDAWG (linear algorithm)
 

Related reference

M. Crochemore and R. Vérin, On Compact Directed Acyclic Word Graphs, in (Structures in Logic and Computer Science, a selection of essays in honor of A. Ehrenfeucht, J. Mycielski, G. Rozenberg and A. Salomaa, eds., LNCS 1261, Springer-Verlag, 1997) pp 192--211.
Institut Gaspard-Monge, Laboratoire d'informatique, le 6 octobre 1999, Maxime Crochemore