Algorithms for Computational Molecular Biology
Politecnico di Milano, May 2007
Lecturer
Maxime Crochemore
@univ-mlv.fr
Aims
The aims of the course are:
(i) to understand the major concepts and problems of computational molecular
biology;
(ii) to appreciate the importance of these concepts in a wide diversity of
practical applications;
(iii) to learn which of the computational molecular biology problems have
efficient algorithmic solutions and which are intractable;
(iv) for some intractable problems, to understand how heuristic approaches
to problem solution may yield fast but only approximate solutions.
Syllabus
Lectures include the following topics:
-
Introduction to Molecular Biology and Genomics
-
Pairwise Sequence Alignment,
[Subquadratic Alignment]
-
Substitution Matrices and Alignment Statistics
-
Sequence Searching
-
Multiple Sequence Alignment
-
Whole Genome Alignment,
Maximal repeats
-
Genome rearrangements
-
Phylogenies
-
Fragment assembly of DNA
-
RNA Structure Prediction
Additional elements
-
Exact Text Searching
-
Suffix Arrays
-
Suffix Trees
Related textbooks
[Gu] D. Gusfield,
Algorithms on Strings, Trees, and Sequences,
Cambridge University Press, New York, 1997, 534 pages.
[SM] J. Setubal and J. Meidanis,
Introduction to Computational Molecular Biology,
PWS Publishing Co., 1997, 296 pages.
Other books
M. Crochemore, C. Hancart et T. Lecroq,
Algorithms on Strings,
Cambridge University Press, New York, 2007, 383 pages.
R. Durbin, S. Eddy, A. Krogh, G. Mitchison,
Biological sequence analysis,
Cambridge University Press, 1998, 356 pages.
P. A. Pevzner,
Computational Molecular Biology,
The MIT Press, 2000, 314 pages.
M.S. Waterman,
Introduction to Computational Biology,
Chapman & Hall, 1995, 431 pages.
Articles
Presentation will be held on the 28th of May, 10am to 1pm.
-
How many genes are there in plants (... and
why are they there)?
-
Codon usage bias from tRNA's point of view:
Redundancy, specialization, and efficient decoding for translation optimization
-
Sequence similarity is more relevant
than species specificity in probabilistic backtranslation
-
Simone Campanoni: Plagiarism and Collusion
Detection using the Smith-Waterman Algorithm
-
Ilario Filippini: Sparse LCS Common Substring Alignment
-
Semi-local longest common subsequences in
subquadratic time
-
Jonatha Anselmi: Lower Bounds on Multiple Sequence Alignment using
Exact 3-way Alignment
-
LAGAN and Multi-LAGAN: Efficient Tools for
Large-Scale Multiple Alignment of Genomic DNA
-
Claudia Nuccio: Multiple Alignment, Index Structures,
Suffix Trees
-
Davide Eynard: MUMmer
-
Faster Algorithms for Computing Longest Common Increasing
Subsequences
-
Alignment of metabolic pathways
-
How to compare arc-annotated sequences: the
Align hierarchy
-
Detecting microsatellites within genomes:
significant variation among algorithms
-
Inference of miRNA targets using evolutionary
conservation and pathway analysis
-
A High-Throughput Approach for
Associating MicroRNAs with their Activity Conditions
-
Minimus: a fast, lightweight genome
assembler
-
A comparative study of bases for motifs
inference
-
Motif Combinator: a web-based tool to search
for combinations of cis-regulatory motifs
-
Sorting by reversals in subquadratic time
-
Efficient pairwise RNA structure prediction
using probabilistic alignment constraints in Dynalign
-
Simone Tognetti:
Multiple Sequence Alignment with genetic algorithm
-
Rossella Blatt: Frequency-domain analysis of biomolecular sequences
and
Spectrogram analysis of Genomes
Pointers
Visualization of algorithms
Pairwise Sequence Alignment
-
Abstract.
Alignments serve to compare strings. They are used for example for file comparison
and management, for music studies, and for biological molecular sequences analysis.
In the latter question the notion captures mutations occurring during the evolution
of biological molecules.
Different types of alignments are presented. Their generic solution is based on
dynamic programming.
-
Content.
Comparison of strings, Levenshtein distance, edit graph, dynamic programming,
all alignments, space reduction, local alignment, BLAST.
Sequence Searching
-
Abstract.
A problem close to alignment is pattern matching with differences, approximate pattern
matching. Differences are substitutions, insertions and deletion.
Basic solutions are mere variations of alignment algorihms.
Several improvements on it are discussed.
-
Content.
String searching with differences, pruned computation, diagonal computation,
search with $k$ mismatches, short motifs and shift-or, FASTA.
April 2007,
Maxime Crochemore