Next: Local similarity Up: Sequence comparison Previous: Longest common subsequences

Cost depending on the length of the gaps

It is possible to penalize the length of the gaps instead of penalizing the deletion or the insertion of individual letters. Let us use a function . indicates the cost of a gap of length k. In this case the computation of an optimal alignment is done using the following recurrence formula:

D(i,j) indicates the score of an optimal alignment between x₀x₁···x_i and y₀y₁···y_j ending with deletions of letters of x.
I(i,j) indicates the score of an optimal alignment between x₀x₁···x_i and y₀y₁···y_j ending with insertions of letters of y.
T[i,j] indicates the score of an optimal alignment between x₀x₁···x_i and y₀y₁···y_j.

Without restriction on the function the problem can be solve in O(mn(m+n)) time.

If the function is affine: which corresponds to penalize the opening of a gap with g and widening a gap is penalized with h then the problem can be solve in O(mn) time ([7]):

algo 5.1

It is easy to see that the tables D, I and T can be reduce to a linear size if only the value of T[m-1,n-1] is needed.

To exhibit an alignment a forth quadratic table has to be used to store in each square which neighbour has served to compute its value (same row and previous column, previous row same column or previous row previous column). Only three different values are possible thus only two bits are enough per square. If all the optimal alignments are to be computed then three bits are required.

Example :
x = YHCQPGK, y = LAHYQQKPGKA, Sub(a,a) = 0, Sub(a,b) = 3, g = 3 and h = 1.

We get the three following optimal alignments:

Next: Local similarity Up: Sequence comparison Previous: Longest common subsequences

e-mails: {Christian.Charras, Thierry.Lecroq}@laposte.net

Thu Feb 19 10:23:29 WET 1998