Text Algorithms

Oxford University Press
Maxime.Crochemore@univ-mlv.fr
Univ. de Marne-la-Vallée, 1994
M. Crochemore and W. Rytter, Text Algorithms, Oxford University Press, New York, 1994, 412 pages. ISBN 0-19-508609-0.
cover 01234567890123456789012345678901234567890123456789

Presentation

This much-needed book on the design of algorithms and data structures for text processing emphasizes both theoretical foundations and practical applications. It is intended to serve both as a textbook for courses on algorithm design, especially those related to text processing, and as a reference for computer science professionals. The work takes a unique approach, to other more general books one that goes more deeply into its topic than other more general books. It contains both classical algorithms and recent results of research on the subject. The book is the first text to contain a collection of a wide range of text algorithms, many of them quite new and appearing here for the first time. Other algorithms, while known by reputation, have never been published in the journal literature.

Table of contents

Preface by Zvi Galil
  1. Introduction, pp 1-11
  2. Foundations, pp 13-31
  3. Basic string-matching algorithms, pp 33-53
  4. The Boyer-Moore algorithm and its variations, pp 55-72
  5. Suffix trees, pp 73-104
  6. Subword graphs, pp 105-130
  7. Automata theoretic approach, pp 131-159
  8. Regularities in texts: symmetries and repetitions, pp 161-179
  9. Almost optimal parallel algorithms, pp 181-211
  10. Text compression techniques, pp 213-236
  11. Approximate pattern matching, pp 237-251
  12. Two-dimensional pattern matching, pp 253-288
  13. Time-space optimal string matching, pp 289-325
  14. Time-processors optimal string matching, pp 327-351
  15. Miscellanies, pp 353-375
Exercises, pp 377-385
Bibliography, pp 387-404
Index, pp 405-412

On-line documents

  • Complete table of contents
    In dvi or ps format, 4pp.
  • Preface
    In dvi or ps format, 2pp.
  • Introduction (Chapter 1)
    In dvi or ps format, 14 pp.
  • Exercises
    In dvi or ps format, 10 pp.
  • Bibliography
    In LaTeX,
    dvi or ps format, 22 pp.
  • Full text
    In pdf format, 396 pp.

Related references

A. Apostolico and Z. Galil, editors, Pattern Matching Algorithms, Oxford University Press, New York, 1997, 377 pages.
M. Crochemore, C. Hancart et T. Lecroq, Algorithmique du texte, Vuibert, 2001, 347 pages.
D. Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press, New York, 1997, 534 pages.
G.A. Stephen, String Searching Algorithms, World Scientific Publishing Co., 1994, 243 pages.

Preface

The design of algorithms that process strings and texts goes back at least twenty five years. In particular, the last ten of those years have produced an explosion of new results. This progress is due in part to the human genome effort, to which string algorithms can make an important contribution.

While text algorithms can be viewed as part of the general field of algorithmic research, it has developed into a respectable subfield on its own. This subfield nicely combines theory and practice. The theory deals with symmetries and periodicities in strings, which in practice may lead to the development of fast new algorithms.

One measure of the vibrance of this new subfield is the ongoing success of a conference devoted to its study. The Conference on Combinatorial Pattern Matching will be holding its fifth meeting in the summer of 1994 to present theoretical results, new algorithms, and new applications of these algorithms.

Following the remarkable progress in this new field, Maxime Crochemore and Wojciech Rytter embarked on the right project at the right time---writing a textbook on text algorithms. Both authors have made important contributions to the field and therefore are excellent choices for the job.

About ten years ago, in a workshop that preceded the conference (called Combinatorial Algorithms on Words) I gave a lecture entitled ``Open Problems in Stringology'' [Ga85a]. (Stringology has become the nickname of this new subfield.) In that lecture I described one dozen open problems. It gives me great pleasure to report that about half of these problems have either been solved or significant progress has been made towards their complete solution. Some of these new developments appear in Crochemore and Rytter's new book. The recent Handbook of Theoretical Computer Science includes an excellent chapter entitled ``Algorithms for finding patterns in strings'' by A.V. Aho, one of the earliest contributors to text algorithms. Each time I have met Aho in the last fifteen years he has encouraged me to try and find a better algorithm for the membership problem for regular expressions, one of my twelve open problems. An improved solution to this problem would have significant practical consequences; in particular, it would improve the performance of the popular EGREP program used by Unix. See chapter 7 of Crochemore and Rytter's book for more details.

Since the research on text algorithms continues, it is not possible to have a book that completely covers the area. Looking at the table of contents of this book, I am convinced that its fifteen chapters cover nicely many of the major developments in the field. Crochemore and Rytter have succeeded in producing a textbook that is as thorough as it is timely.

Zvi Galil

Acknowledgements

We have worked on this book during visits to each other at LIPN (University of Paris-Nord), Instytut Informatyk (University of Warsaw), LITP (University Paris 7), and Institut Gaspard-Monge (University of Marne-la-Vallée). We warmly thank our colleagues of all these laboratories for the kind and stimulating atmosphere that has favoured the creation of the book. We especially thank the persons that help us produce that text by reading preliminary versions of some chapters, or by their comments: Véronique Bruyère, Richard Cole, Artur Czumaj, Zvi Galil, Leszek Gasieniec, Christophe Hancart, Thierry Lecroq, Larry Larmore, Dominique Perrin, Giuseppina Rindone, Marc Zipstein.

Maxime Crochemore
Wojciech Rytter

 
Institut Gaspard-Monge, Laboratoire d'informatique, le 23 novembre 2001, Maxime Crochemore