Morris-Pratt algorithmESMAJKarp-Rabin algorithmContents
Next: Morris-Pratt algorithm Up: ESMAJ Prev: Karp-Rabin algorithm

Shift Or algorithm


Main features
Description

The Shift Or algorithm uses bitwise techniques. Let R be a bit array of size m. Vector Rj is the value of the array R after text character y[j] has been processed (see figure 5.1). It contains informations about all matches of prefixes of x that end at position j in the text for 0 < i leq m-1:

  figure 5.1
Figure 5.1: Meaning of vector Rj.

The vector Rj+1 can be computed after Rj as follows. For each Rj[i]=0:

and

If Rj+1[m-1]=0 then a complete match can be reported.

The transition from Rj to Rj+1 can be computed very fast as follows: For each c in Sigma, let Sc be a bit array of size m such that: for 0 leq i < m-1, Sc[i]=0 iff x[i]=c.

The array Sc denotes the positions of the character c in the pattern x. Each Sc can be preprocessed before the search. And the computation of Rj+1 reduces to two operations, shift and or: Rj+1=SHIFT(Rj) OR Sy[j+1]

Assuming that the pattern length is no longer than the memory-word size of the machine, the space and time complexity of the preprocessing phase is O(m+sigma).

The time complexity of the searching phase is O(n), thus independent from the alphabet size and the pattern length.

The C code
  int preSo(char *x, int m, unsigned int S[]) { 
    unsigned int j, lim; 
    int i; 
    for (i = 0; i < ASIZE; ++i) 
      S[i] = ~0; 
    for (lim = i = 0, j = 1; i < m; ++i, j <<= 1) { 
      S[x[i]] &= ~j; 
      lim |= j; 
    } 
    lim = ~(lim>>1); 
    return(lim); 
  } 

  void SO(char *x, int m, char *y, int n) { 
    unsigned int lim, state; 
    unsigned int S[ASIZE]; 
    int j; 
    if (m > WORD) 
      error("SO: Use pattern size <= word size"); 

    /* Preprocessing */ 
    lim = preSo(x, m, S); 

    /* Searching */ 
    for (state = ~0, j = 0; j < n; ++j) { 
      state = (state<<1) | S[y[j]]; 
      if (state < lim) 
        OUTPUT(j - m + 1); 
    } 
  } 

The example

Searching phase

Shift Or search table
As R12[7]=0 it means that an occurrence of x has been found at position 12-8+1=5.

Sorry the new example is not ready... See the Java applet.


References

Please note that this page has been translated into Ukrainian by Vasil Vashenko and into Danish by Nastasya Zemina.
Morris-Pratt algorithmESMAJKarp-Rabin algorithmContents
Next: Morris-Pratt algorithm Up: ESMAJ Prev: Karp-Rabin algorithm

e-mails: {Christian.Charras, Thierry.Lecroq}@laposte.net
Tue Jan 14 15:03:31 MET 1997