Not So Naive algorithmESMAJGalil-Giancarlo algorithmContents
Next: Not So Naive algorithm Up: ESMAJ Prev: Galil-Giancarlo algorithm

Apostolico-Crochemore algorithm


Main features
Description

The Apostolico-Crochemore uses the kmpNext shift table (see chapter Knuth, Morris and Pratt algorithm) to compute the shifts.

Let ell=0 if x is a power of a single character (x=cm with c in Sigma) and ell be equal to the position of the first character of x different from x[0] otherwise (x=aellbu for a, b in Sigma, u in Sigma* and a neq b). During each attempt the comparisons are made with pattern positions in the following order: ell, ell+1, ... , m-2, m-1, 0, 1, ... , ell-1.

During the searching phase we consider triple of the form (i, j, k) where:
  the window is positioned on the text factor y[j .. j+m-1];
  leq k leq ell and x[0 .. k-1]=y[j .. j+k-1];
  ell leq i < m and x[ell .. i-1]=y[j+ell .. i+j-1].

The initial triple is (ell,0,0).

  figure 11.1
Figure 11.1: At each attempt of the Apostolico-Crochemore algorithm we consider a triple (i,j,k).

We now explain how to compute the next triple after (i,j,k) has been computed.

Three cases arise depending on the value of i:
  i = ell
If x[i] = y[i+j] then the next triple is (i+1, j, k).
If x[ineq y[i+j] then the next triple is (ell, j+1, max{0, k-1}).
  ell < i < m
If x[i] = y[i+j] then the next triple is (i+1, j, k).
If x[ineq y[i+j] then two cases arise depending on the value of kmpNext[i]:
  • kmpNext[ileq ell: then the next triple is (ell, i+j-kmpNext[i], max{0, kmpNext[i]})
  • kmpNext[i] > ell: then the next triple is (kmpNext[i], i+j-kmpNext[i], ell)
  i =m
If k < ell and x[k]=y[j+k] then the next triple is (i, j, k+1).
Otherwise either k < ell and x[kneq y[j+k], or k=ell. If k=ell an occurrence of x is reported. In both cases the next triple is computed in the same manner as in the case where ell < i < m.

The preprocessing phase consists in computing the table kmpNext and the integer ell. It can be done in O(m) space and time. The searching phase is in O(n) time complexity and furthermore the Apostolico-Crochemore algorithm performs at most 3/2n text character comparisons in the worst case.

The C code

The function preKmp is given chapter Knuth, Morris and Pratt algorithm.

void AXAMAC(char *x, int m, char *y, int n) {
   int i, j, k, ell, kmpNext[XSIZE];

   /* Preprocessing */
   preKmp(x, m, kmpNext);
   for (ell = 1; x[ell - 1] == x[ell]; ell++);
   if (ell == m)
      ell = 0;

   /* Searching */
   i = ell;
   j = k = 0;
   while (j <= n - m) {
      while (i < m && x[i] == y[i + j])
         ++i;
      if (i >= m) {
         while (k < ell && x[k] == y[j + k])
            ++k;
         if (k >= ell)
            OUTPUT(j);
      }
      j += (i - kmpNext[i]);
      if (i == ell)
         k = MAX(0, k - 1);
      else
         if (kmpNext[i] <= ell) {
            k = MAX(0, kmpNext[i]);
            i = ell;
         }
         else {
            k = ell;
            i = kmpNext[i];
         }
   }
}

The example

Preprocessing phase

Apostolico-Crochemore kmpNext table
kmpNext table used by Apostolico-Crochemore algorithm

Searching phase


References


Not So Naive algorithmESMAJGalil-Giancarlo algorithmContents
Next: Not So Naive algorithm Up: ESMAJ Prev: Galil-Giancarlo algorithm

e-mails: {Christian.Charras, Thierry.Lecroq }@laposte.net
Tue Jan 14 15:03:31 MET 1997