International PhD School in Formal Languages and Applications

Text Retrieval

Tarragona, September 2004

Lecturer

Maxime Crochemore,   mac@univ-mlv

Aims

The course is devoted to algorithms processing strings and texts efficiently. These types of algorithms are used for software design in the domains of operating systems utilities, search engines on the Internet, data retrieval systems, analysis of genetic sequences, and natural language processing, for example. The underlying methodology borrows elements from automata theory and combinatorics on words.

Syllabus

Lectures include the following topics:

Related textbooks

A. Apostolico and Z. Galil, editors, Pattern Matching Algorithms, Oxford University Press, New York, 1997, 377 pages.
M. Crochemore, C. Hancart et T. Lecroq, Algorithmique du texte, Vuibert, Paris, 2001, 347 pages.
M. Crochemore and T. Lecroq, Pattern matching and text compression algorithms, in (The Computer Science and Engineering Handbook, A.B. Tucker Jr, ed., CRC Press, 2003) Chapter 8, to appear. Locally available in ps format, 50 pages.
M. Crochemore and W. Rytter, Jewels of Stringology, World Scientific, 2002, 310 pages.
D. Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press, New York, 1997, 534 pages.
G. Navarro and M. Raffinot, Flexible Pattern Matching in Strings, Cambridge University Press, Cambridge, 2002, 221 pages.
B. Smyth Computing Patterns in Strings, Pearson Addison-Wesley, 2003.
G.A. Stephen, String Searching Algorithms, World Scientific Publishing Co., 1994, 243 pages.

Pointers

Bibliographies

Visualization of algorithms


Text Searching

Searching a list of strings --- Suffix Arrays

Structures for indexing texts


December 2003, Maxime Crochemore