Alpha Skip Search algorithmESMAJSkip Search algorithmContents
Next: Alpha Skip Search algorithm Up: ESMAJ Prev: Skip Search algorithm

KMP Skip Search algorithm


Main features
Description

It is possible to make the Skip Search algorithm (see chapter Skip Search algorithm) linear using the two shift tables of Morris-Pratt (see chapter Morris and Pratt algorithm) and Knuth-Morris-Pratt (see chapter Knuth, Morris and Pratt algorithm).

For 1 leq i leq m, mpNext[i] is equal to the length of the longest border of x[0 .. i-1] and mpNext[0]=-1.

For 1 leq i < m, kmpNext[i] is equal to length of the longest border of x[0 .. i-1] followed by a character different from x[i], kmpNext[0]=-1 and kmpNext[m]=m-per(x).

The lists in the buckets are explicitly stored in a table list.

The preprocessing phase of the KmpSkip Search algorithm is in O(m+sigma) time and space complexity.

A general situation for an attempt during the searching phase is the following ((see figure 30.1):
  j is the current text position;
  x[i] = y[j];
  start = j-i is the possible starting position of an occurrence of x in y;
  wall is the rightmost scanned text position;
  x[0 .. wall-start-1] = y[start .. wall-1];

  figure 30.1
Figure 30.1: General situation during the searching phase of the linear algorithm.

The comparisons are performed from left to right between x[wall-start .. m-1] and y[wall .. start+m-1] until a mismatch or a whole match occurs. Let k geq wall-start be the smallest integer such that x[kneq y[start+k] or k = m if an occurrence of x starts at position start in y.

Then wall takes the value of start+k.

After that the algorithm KmpSkip computes two shifts (two new starting positions): the first one according to the skip algorithm (see algorithm AdvanceSkip for details), this gives us a starting position skipStart, the second one according to the shift table of Knuth-Morris-Pratt, which gives us another starting position kmpStart.

Several cases can arise:
  skipStart < kmpStart then a shift according to the skip algorithm is applied which gives a new value for skipStart, and we have to compare again skipStart and kmpStart;
  kmpStart < skipStart < wall then a shift according to the shift table of Morris-Pratt is applied. This gives a new value for kmpStart. We have to compare again skipStart and kmpStart;
  skipStart = kmpStart then another attempt can be performed with start = skipStart;
  kmpStart < wall < skipStart then another attempt can be performed with start = skipStart.

The searching phase of the KmpSkip Search algorithm is in O(n) time.

The C code

The function preMp is given chapter Morris and Pratt algorithm and the function preKmp is given chapter Knuth, Morris and Pratt algorithm.

int attempt(char *y, char *x, int m, int start, int wall) {
   int k;

   k = wall - start;
   while (k < m && x[k] == y[k + start])
      ++k;
   return(k);
}


void KMPSKIP(char *x, int m, char *y, int n) {
   int i, j, k, kmpStart, per, start, wall;
   int kmpNext[XSIZE], list[XSIZE], mpNext[XSIZE],
       z[ASIZE];

   /* Preprocessing */
   preMp(x, m, mpNext);
   preKmp(x, m, kmpNext);
   memset(z, -1, ASIZE*sizeof(int));
   memset(list, -1, m*sizeof(int));
   z[x[0]] = 0;
   for (i = 1; i < m; ++i) {
      list[i] = z[x[i]];
      z[x[i]] = i;
   }

   /* Searching */
   wall = 0;
   per = m - kmpNext[m];
   i = j = -1;
   do {
      j += m;
   } while (j < n && z[y[j]] < 0);
   if (j >= n)
     return;
   i = z[y[j]];
   start = j - i;
   while (start <= n - m) {
      if (start > wall)
         wall = start;
      k = attempt(y, x, m, start, wall);
      wall = start + k;
      if (k == m) {
         OUTPUT(start);
         i -= per;
      }
      else
         i = list[i];
      if (i < 0) {
         do {
            j += m;
         } while (j < n && z[y[j]] < 0);
         if (j >= n)
            return;
         i = z[y[j]];
      }
      kmpStart = start + k - kmpNext[k];
      k = kmpNext[k];
      start = j - i;
      while (start < kmpStart ||
             (kmpStart < start && start < wall)) {
         if (start < kmpStart) {
            i = list[i];
            if (i < 0) {
               do {
                  j += m;
               } while (j < n && z[y[j]] < 0);
               if (j >= n)
                  return;
               i = z[y[j]];
            }
            start = j - i;
         }
         else {
            kmpStart += (k - mpNext[k]);
            k = mpNext[k];
         }
      }
   }
}

The example

Preprocessing phase

Kmp skip search Z table
Tables used by KMP Skip Search algorithm.

Searching phase


References


Alpha Skip Search algorithmESMAJSkip Search algorithmContents
Next: Alpha Skip Search algorithm Up: ESMAJ Prev: Skip Search algorithm

e-mails: {Christian.Charras, Thierry.Lecroq }@laposte.net
Tue Jan 14 15:03:31 MET 1997