Algorithms for Computational Molecular Biology

Politecnico di Milano, May 2007


Maxime Crochemore


The aims of the course are: (i) to understand the major concepts and problems of computational molecular biology; (ii) to appreciate the importance of these concepts in a wide diversity of practical applications; (iii) to learn which of the computational molecular biology problems have efficient algorithmic solutions and which are intractable; (iv) for some intractable problems, to understand how heuristic approaches to problem solution may yield fast but only approximate solutions.


Lectures include the following topics:

  1. Introduction to Molecular Biology and Genomics
  2. Pairwise Sequence Alignment, [Subquadratic Alignment]
  3. Substitution Matrices and Alignment Statistics
  4. Sequence Searching
  5. Multiple Sequence Alignment
  6. Whole Genome Alignment, Maximal repeats
  7. Genome rearrangements
  8. Phylogenies
  9. Fragment assembly of DNA
  10. RNA Structure Prediction

Additional elements

Related textbooks

[Gu] D. Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press, New York, 1997, 534 pages.
[SM] J. Setubal and J. Meidanis, Introduction to Computational Molecular Biology, PWS Publishing Co., 1997, 296 pages.

Other books

M. Crochemore, C. Hancart et T. Lecroq, Algorithms on Strings, Cambridge University Press, New York, 2007, 383 pages.
R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Biological sequence analysis, Cambridge University Press, 1998, 356 pages.
P. A. Pevzner, Computational Molecular Biology, The MIT Press, 2000, 314 pages.
M.S. Waterman, Introduction to Computational Biology, Chapman & Hall, 1995, 431 pages.


Presentation will be held on the 28th of May, 10am to 1pm.

  1. How many genes are there in plants (... and why are they there)?
  2. Codon usage bias from tRNA's point of view: Redundancy, specialization, and efficient decoding for translation optimization
  3. Sequence similarity is more relevant than species specificity in probabilistic backtranslation
  4. Simone Campanoni: Plagiarism and Collusion Detection using the Smith-Waterman Algorithm
  5. Ilario Filippini: Sparse LCS Common Substring Alignment
  6. Semi-local longest common subsequences in subquadratic time
  7. Jonatha Anselmi: Lower Bounds on Multiple Sequence Alignment using Exact 3-way Alignment
  8. LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA
  9. Claudia Nuccio: Multiple Alignment, Index Structures, Suffix Trees
  10. Davide Eynard: MUMmer
  11. Faster Algorithms for Computing Longest Common Increasing Subsequences
  12. Alignment of metabolic pathways
  13. How to compare arc-annotated sequences: the Align hierarchy
  14. Detecting microsatellites within genomes: significant variation among algorithms
  15. Inference of miRNA targets using evolutionary conservation and pathway analysis
  16. A High-Throughput Approach for Associating MicroRNAs with their Activity Conditions
  17. Minimus: a fast, lightweight genome assembler
  18. A comparative study of bases for motifs inference
  19. Motif Combinator: a web-based tool to search for combinations of cis-regulatory motifs
  20. Sorting by reversals in subquadratic time
  21. Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign
  22. Simone Tognetti: Multiple Sequence Alignment with genetic algorithm
  23. Rossella Blatt: Frequency-domain analysis of biomolecular sequences and Spectrogram analysis of Genomes


Visualization of algorithms

Pairwise Sequence Alignment

Sequence Searching

April 2007, Maxime Crochemore