- requires an ordered alphabet;
- preprocessing phase in
(**O***m*) time and constant space complexity; - constant space complexity for the preprocessing phase;
- searching phase in
(**O***n*) time; - performs 2
*n*-*m*text character comparisons in the worst case.

The pattern *x* is factorized in two parts *x*_{} and *x*_{r} such that *x*=*x*_{}x_{r}. Then the search phase of the Two Way algorithm consists in comparing the characters of *x*_{r} from left to right and then, if no mismatch occurs during that first stage, in comparing the characters of *x*_{} from right to left in a second stage.

The preprocessing phase of the algorithm consists then in choosing a good **factorization***x*_{}x_{r}.

Let (*u*, *v*) be a factorization of *x*. A * repetition* in (

*w*is a suffix of*u*or*u*is a suffix of*w*;*w*is a prefix of*v*of*v*is a prefix of*w*.

In other words *w* occurs at both sides of the cut between *u* and *v* with a possible overflow on either side. The length of a repetition in (*u*,*v*) is called a * local period* and the length of the smallest repetition in (

Each factorization (*u*,*v*) of has at least one repetition. It can be easily seen that 1 *r*(*u*,*v*) |*x*|

A factorization (*u*,*v*) of *x* such that *r*(*u*,*v*)=*per*(*x*) is called a * critical factorization* of

If (*u*,*v*) is a critical factorization of *x* then at the position |*u*| in *x* the global and the local periods are the same. The Two Way algorithm chooses the critical factorization (*x*_{},*x*_{r}) such that |*x*_{}| < *per*(*x*) and |*x*_{}| is minimal.

To compute the critical factorization *x*_{},*x*_{r} of *x* we first compute the maximal suffix *z* of *x* for the order and the maximal suffix for the reverse order . Then (*x*_{},*x*_{r}) is chosen such that. |*x*_{}|=*max*{|*z*|,||}

The preprocessing phase can be done in * O*(

The searching phase of the Two Way algorithm consists in first comparing the character of *x*_{r} from left to right, then the character of *x*_{} from right to left.

When a mismatch occurs when scanning the *k*-th character of *x*_{r}, then a shift of length *k* is performed.

When a mismatch occurs when scanning *x*_{}, or when an occurrence of the pattern is found, then a shift of length *per*(*x*) is performed.

Such a scheme leads to a quadratic worst case algorithm, this can be avoided by a prefix memorization: when a shift of length *per*(*x*) is performed the length of the matching prefix of the pattern at the beginning of the window (namely *m*-*per*(*x*)) after the shift is memorized to avoid to scan it again during the next attempt.

The searching phase of the Two Way algorithm can be done in * O*(

The Two Way algorithm performs 2*n*-*m* text character comparisons in the worst case. Breslauer designed a variation on the Two Way algorithm which performs less than 2*n*-*m* comparisons using constant space.

/* Computing of the maximal suffix for <= */ int maxSuf(char *x, int m, int *p) { int ms, j, k; char a, b; ms = -1; j = 0; k = *p = 1; while (j + k < m) { a = x[j + k]; b = x[ms + k]; if (a < b) { j += k; k = 1; *p = j - ms; } else if (a == b) if (k != *p) ++k; else { j += *p; k = 1; } else { /* a > b */ ms = j; j = ms + 1; k = *p = 1; } } return(ms); } /* Computing of the maximal suffix for >= */ int maxSufTilde(char *x, int m, int *p) { int ms, j, k; char a, b; ms = -1; j = 0; k = *p = 1; while (j + k < m) { a = x[j + k]; b = x[ms + k]; if (a > b) { j += k; k = 1; *p = j - ms; } else if (a == b) if (k != *p) ++k; else { j += *p; k = 1; } else { /* a < b */ ms = j; j = ms + 1; k = *p = 1; } } return(ms); } /* Two Way string matching algorithm. */ void TW(char *x, int m, char *y, int n) { int i, j, ell, memory, p, per, q; /* Preprocessing */ i = maxSuf(x, m, &p); j = maxSufTilde(x, m, &q); if (i > j) { ell = i; per = p; } else { ell = j; per = q; } /* Searching */ if (memcmp(x, x + per, ell + 1) == 0) { j = 0; memory = -1; while (j <= n - m) { i = MAX(ell, memory) + 1; while (i < m && x[i] == y[i + j]) ++i; if (i >= m) { i = ell; while (i > memory && x[i] == y[i + j]) --i; if (i <= memory) OUTPUT(j); j += per; memory = m - per - 1; } else { j += (i - ell); memory = -1; } } } else { per = MAX(ell + 1, m - ell - 1) + 1; j = 0; while (j <= n - m) { i = ell + 1; while (i < m && x[i] == y[i + j]) ++i; if (i >= m) { i = ell; while (i >= 0 && x[i] == y[i + j]) --i; if (i < 0) OUTPUT(j); j += per; } else j += (i - ell); } } }

Preprocessing phase

Factorisation used by Two Way algorithm.

Searching phase

- BRESLAUER, D., 1996, Saving comparisons in the Crochemore-Perrin string matching algorithm,
*Theoretical Computer Science*158(1-2):177-192. - CROCHEMORE, M., 1997. Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostolico and Z. Galil, Chapter 1, pp 1-53, Oxford University Press.
**CROCHEMORE M.**,**PERRIN D.**, 1991, Two-way string-matching,*Journal of the ACM*38(3):651-675.- CROCHEMORE, M., RYTTER, W., 1994, Text Algorithms,
*Oxford University Press*.

Tue Jan 14 15:03:31 MET 1997