Pattern matching and text compression algorithms

The Computer Science and Engineering Hanbook
Maxime.Crochemore@univ-mlv.fr
Université de Marne-la-Vallée, 2003
M. Crochemore and T. Lecroq, Pattern Matching and Text Compression Algorithms, in (The Computer Science and Engineering Handbook, A.B. Tucker, Jr, ed., CRC Press, Boca Raton, 2003) Chapter 8. To appear.
In ps format, 50 pp.

Abstract

Pattern matching is the problem of locating a specific pattern inside raw data. The pattern is usually a collection of strings described in some formal language. Applications require two kinds of solution depending on which string, the pattern or the text, is given first. Solutions based on the use of automata or combinatorial properties of strings are commonly implemented to preprocess the pattern. The notion of indexes realized by trees or automata is used in the second kind of solutions.

The aim of data compression is to provide representation of data in a reduced form in order to both save storage place and save transmission time. There is no loss of information, the compression processes are reversible.

Since algorithms apply to data that tend to double every eighteen months, algorithms should be efficient in time and space even if the speed and the capacity of storage of computers increase regularly.

Contents

  • String-matching algorithms
  • Two-dimensional pattern matching algorithms
  • Suffix trees
  • Longest common subsequence of two strings
  • Approximate string matching
  • Text compression
 

Note

This is Chapter 8 of The Computer Science and Engineering Hanbook edited by Allen B. Tucker and published by CRC Press, Inc (2003). The entire handbook is roughly 2600 typeset pages, and is organized around the major subject areas of the discipline.

Institut Gaspard-Monge, Laboratoire d'informatique, le 8 janvier 2003, Maxime Crochemore