Algorithms for Computational Molecular Biology
Politecnico di Milano, May 2007
The aims of the course are:
(i) to understand the major concepts and problems of computational molecular
(ii) to appreciate the importance of these concepts in a wide diversity of
(iii) to learn which of the computational molecular biology problems have
efficient algorithmic solutions and which are intractable;
(iv) for some intractable problems, to understand how heuristic approaches
to problem solution may yield fast but only approximate solutions.
Lectures include the following topics:
Introduction to Molecular Biology and Genomics
Pairwise Sequence Alignment,
Substitution Matrices and Alignment Statistics
Multiple Sequence Alignment
Whole Genome Alignment,
Fragment assembly of DNA
RNA Structure Prediction
Exact Text Searching
[Gu] D. Gusfield,
Algorithms on Strings, Trees, and Sequences,
Cambridge University Press, New York, 1997, 534 pages.
[SM] J. Setubal and J. Meidanis,
Introduction to Computational Molecular Biology,
PWS Publishing Co., 1997, 296 pages.
M. Crochemore, C. Hancart et T. Lecroq,
Algorithms on Strings,
Cambridge University Press, New York, 2007, 383 pages.
R. Durbin, S. Eddy, A. Krogh, G. Mitchison,
Biological sequence analysis,
Cambridge University Press, 1998, 356 pages.
P. A. Pevzner,
Computational Molecular Biology,
The MIT Press, 2000, 314 pages.
Introduction to Computational Biology,
Chapman & Hall, 1995, 431 pages.
Presentation will be held on the 28th of May, 10am to 1pm.
How many genes are there in plants (... and
why are they there)?
Codon usage bias from tRNA's point of view:
Redundancy, specialization, and efficient decoding for translation optimization
Sequence similarity is more relevant
than species specificity in probabilistic backtranslation
Simone Campanoni: Plagiarism and Collusion
Detection using the Smith-Waterman Algorithm
Ilario Filippini: Sparse LCS Common Substring Alignment
Semi-local longest common subsequences in
Jonatha Anselmi: Lower Bounds on Multiple Sequence Alignment using
Exact 3-way Alignment
LAGAN and Multi-LAGAN: Efficient Tools for
Large-Scale Multiple Alignment of Genomic DNA
Claudia Nuccio: Multiple Alignment, Index Structures,
Davide Eynard: MUMmer
Faster Algorithms for Computing Longest Common Increasing
Alignment of metabolic pathways
How to compare arc-annotated sequences: the
Detecting microsatellites within genomes:
significant variation among algorithms
Inference of miRNA targets using evolutionary
conservation and pathway analysis
A High-Throughput Approach for
Associating MicroRNAs with their Activity Conditions
Minimus: a fast, lightweight genome
A comparative study of bases for motifs
Motif Combinator: a web-based tool to search
for combinations of cis-regulatory motifs
Sorting by reversals in subquadratic time
Efficient pairwise RNA structure prediction
using probabilistic alignment constraints in Dynalign
Multiple Sequence Alignment with genetic algorithm
Rossella Blatt: Frequency-domain analysis of biomolecular sequences
Spectrogram analysis of Genomes
Visualization of algorithms
Pairwise Sequence Alignment
Alignments serve to compare strings. They are used for example for file comparison
and management, for music studies, and for biological molecular sequences analysis.
In the latter question the notion captures mutations occurring during the evolution
of biological molecules.
Different types of alignments are presented. Their generic solution is based on
Comparison of strings, Levenshtein distance, edit graph, dynamic programming,
all alignments, space reduction, local alignment, BLAST.
A problem close to alignment is pattern matching with differences, approximate pattern
matching. Differences are substitutions, insertions and deletion.
Basic solutions are mere variations of alignment algorihms.
Several improvements on it are discussed.
String searching with differences, pruned computation, diagonal computation,
search with $k$ mismatches, short motifs and shift-or, FASTA.