DCA Compression

Other site
Maxime.Crochemore@univ-mlv.fr
University of Marne-la-Vallée, June 1998
M. Crochemore, F. Mignosi, A. Restivo, S. Salemi, Data Compression using Antidictionaries, in (Proceedings of the I.E.E.E., Lossless Data Compression, J. Storer, ed., 88-11, 2000) 1756--1768.
In ps format, 13 pages.
Technical Report, IGM-2000-03, Institut Gaspard-Monge, 2000.
In ps format, 24pp.

Abstract

We give a new text compression scheme based on Forbidden Words ("antidictionary"). We prove that our algorithms attain the entropy for equilibrated binary sources. One of the main advantage of this approach is that it produces very fast decompressors. A second advantage is a synchronization property that is helpful to search compressed data and to parallelize the compressor. Our algorithms can also be presented as ``compilers'' that create compressors dedicated to any previously fixed source. The techniques used in this paper are from Information Theory and Finite Automata; as a consequence, this paper shows that Formal Language Theory (in particular Finite Automata Theory) can be useful in Data Compression.
Keywords: Data Compression, Lossless Compression, Information Theory, Finite Automaton, Forbidden Word, Pattern Matching.

Contents

  1. Introduction
  2. Basic algorithms
  3. Implementation of finite antidictionaries
    Example, Transducers, A synchronisation property
  4. Efficiency
  5. How to build antidictionaries
  6. Pruning antidictionaries
    Pruned antidictionary, Self-compressing the antidictionary
  7. Conclusion
 

Related references

M. Crochemore, F. Mignosi and A. Restivo, Automata and forbidden words, Information Processing Letters, 67,3 (1998) 111-117.
In dvi or ps format, 11pp.
M. Crochemore, F. Mignosi and A. Restivo. Minimal Forbidden Words and Factor Automata. in ( MFCS'98, L. Brim, J. Gruska, J. Zlatuska, eds., LNCS 1450, Springer, 1998) 665-673.
In dvi or ps format, 9pp.
M.-P. Béal, F. Mignosi and A. Restivo. Minimal Forbidden Words and Symbolic Dynamics. in ( STACS'96, C. Puech and R. Reischuk, eds., LNCS 1046, Springer, 1996) 555--566.
In ps format, 12pp.
M. Crochemore, F. Mignosi, A. Restivo and S. Salemi. Data encoding/decoding process. United States patent pending.
M. Crochemore, F. Mignosi, A. Restivo and S. Salemi, Text compression using antidictonaries, in (ICALP'99, J. Wiedermann, P. van Emde Boas and M. Nielsen, eds., LNCS 1644, Springer, 1999) 261-270.
Preliminary version as Rapport I.G.M. 98-10, Institut Gaspard-Monge, 1998, 20 pages.
In ps format, 12pp.
Y. Shibata, M. Takeda, A. Shinohara and S. Arikawa. Pattern matching in text compressed by using antidictionaries. in (CPM'99, M. Crochemore and M. Paterson, eds., LNCS 1645, Springer, 1999) 37-49.
Technical Report. In ps format, 13 pages.
Institut Gaspard-Monge, Laboratoire d'informatique, le 25 juin 2001, Maxime Crochemore