Good-Suffix Table

\( \def\dd{\dot{}\dot{}} \def\tbsuff{\mathit{good\text{-}suff}} \def\per{\mathit{per}} \)

The Boyer-Moore algorithm (BM) applies the sliding window strategy on the text to locate occurrences of a pattern. It requires a pattern preprocessing to accelerate the search.

At a given step, the algorithm compares the pattern and a window on the text by computing their longest common suffix $u$. If $u=x$ a match occurs. Otherwise, in the generic situation, pattern $x[0\dd m-1]$ is aligned with the window, factor $y[j-m+1\dd j]$ of the text, $au$ is a suffix of $x$ and $bu$ a suffix of the window, for different letters $a$ and $b$.

Good Suffix Table

To continue the search, Algorithm BM slides the window according to the period of $x$ in case of a match or otherwise to the factor $bu$ of the text to avoid positions of the window where no occurrence of $x$ is possible. To do so it uses the good-suffix table $\tbsuff$ defined for a position $i$ on $x$ and $u=x[i+1\dd m-1]$ by $$\tbsuff[i]=\min\{|v| \mid x \mbox{ suffix of } uv \mbox{ or } cuv \mbox{ suffix of } x, c\neq x[i]\}.$$

The condition "$cuv \mbox{ suffix of } x$" with $c\neq a=x[i]$ ensures that when letter $c$ is aligned with letter $b=y[j-m+1+i]$ after sliding the window, the same mismatch does not re-occur immediately. From the definition note that $\tbsuff[0]=\per(x)$.

The good-suffix table can be computed in $O(|x|)$ time by the following algorithm.

GoodSuffixes$(x \textrm{ non-empty word of length } m, suff)$
    \begin{algorithmic}
        \STATE $p\leftarrow 0$
        \FOR{$k\leftarrow m-2$ \DOWNTO $-1$}
                \IF{$k = -1$ \OR $suff[k] = k+1$}
                        \WHILE{$p \lt m-1-k$}
                                \STATE $good\text{-}suff[p]\leftarrow m-1-k$
                                \STATE $p\leftarrow p+1$
                        \ENDWHILE
                \ENDIF 
        \ENDFOR 
        \FOR{$k\leftarrow 0$ \TO $m-2$}
                \STATE $good\text{-}suff[m-1-suff[k]]\leftarrow m-1-k$
        \ENDFOR
        \RETURN{$good\text{-}suff$}
    \end{algorithmic}

References

  • A. Apostolico and R. Giancarlo. The Boyer-Moore-Galil string searching strategies revisited. SIAM J. Comput., 15(1):98-105, 1986.
  • R. S. Boyer and J. S. Moore. A fast string searching algorithm. Commun. ACM, 20(10):762-772, 1977.
  • M. Crochemore, C. Hancart, and T. Lecroq. Algorithms on Strings. Cambridge University Press, 2007. 392 pages.
  • M. Crochemore and T. Lecroq. Tight bounds on the complexity of the Apostolico-Giancarlo algorithm. Inf. Process. Lett., 63(4):195-203, 1997.
  • M. Crochemore, T. Lecroq, and W. Rytter. 125 Problems in Text Algorithms. Cambridge University Press, 2021.
  • D. E. Knuth, J. H. Morris Jr., and V. R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(2):323-350, 1977.
  • W. Rytter. A correct preprocessing algorithm for Boyer-Moore string searching. SIAM J. Comput., 9(3):509-512, 1980.