International PhD School in Formal Languages and Applications

Text Retrieval

Tarragona, September 2004


Maxime Crochemore,   mac@univ-mlv


The course is devoted to algorithms processing strings and texts efficiently. These types of algorithms are used for software design in the domains of operating systems utilities, search engines on the Internet, data retrieval systems, analysis of genetic sequences, and natural language processing, for example. The underlying methodology borrows elements from automata theory and combinatorics on words.


Lectures include the following topics:

Related textbooks

A. Apostolico and Z. Galil, editors, Pattern Matching Algorithms, Oxford University Press, New York, 1997, 377 pages.
M. Crochemore, C. Hancart et T. Lecroq, Algorithmique du texte, Vuibert, Paris, 2001, 347 pages.
M. Crochemore and T. Lecroq, Pattern matching and text compression algorithms, in (The Computer Science and Engineering Handbook, A.B. Tucker Jr, ed., CRC Press, 2003) Chapter 8, to appear. Locally available in ps format, 50 pages.
M. Crochemore and W. Rytter, Jewels of Stringology, World Scientific, 2002, 310 pages.
D. Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press, New York, 1997, 534 pages.
G. Navarro and M. Raffinot, Flexible Pattern Matching in Strings, Cambridge University Press, Cambridge, 2002, 221 pages.
B. Smyth Computing Patterns in Strings, Pearson Addison-Wesley, 2003.
G.A. Stephen, String Searching Algorithms, World Scientific Publishing Co., 1994, 243 pages.



Visualization of algorithms

Text Searching

Searching a list of strings --- Suffix Arrays

Structures for indexing texts

December 2003, Maxime Crochemore