Direct construction of DAWG

Combinatorial Pattern Matching
Université de Marne-la-Vallée, 1997
M. Crochemore and R. Vérin, Direct construction of Compact Directed Acyclic Word Graphs, in (CPM97, A. Apostolico and J. Hein, eds., LNCS 1264, Springer-Verlag, 1997) pp 116--129.
The Directed Acyclic Word Graph (DAWG) is an efficient data structure used to treat and analyze repetitions in texts, especially in DNA genomic sequences. Here, we consider the Compact Directed Acyclic Word Graph of a word. We give the first direct algorithm to construct it. It runs in time linear in the length of the string on a fixed alphabet. Our implementation requires half the memory space used by ordinary DAWGs.
Keywords: pattern matching algorithm, suffix automaton, DAWG, compact DAWG, suffix tree, index on text.


  • Definitions
  • Compact Directed Acyclic Word Graphs (size bounds)
  • Constructing CDAWG from DAWG
  • Direct Construction of CDAWG (linear algorithm)

