Boyer-Moore Algorithm

 

The Boyer-Moore algorithm is consider the most efficient string-matching algorithm in usual applications, for example, in text editors and commands substitutions. The reason is that it woks the fastest when the alphabet is moderately sized and the pattern is relatively long.

The algorithm scans the characters of the pattern from right to left beginning with the rightmost character. During the testing of a possible placement of pattern P against text T, a mismatch of text character T[i] = c with the corresponding pattern character P[j] is handled as follows: If c is not contained anywhere in P, then shift the pattern P completely past T[i]. Otherwise, shift P until an occurrence of character c in P gets aligned with T[i].

This technique likely to avoid lots of needless comparisons by significantly shifting pattern relative to text.

 

Last Function

We define a function last(c) that takes a character c from the alphabet and specifies how far may shift the pattern P if a character equal to c is found in the text that does not match the pattern

 

  ì index of the last occurrence if c is in P
last(c) =  í of c in pattern P  
  î -1 otherwise

 

For example consider

 

T: 0 1 2 3 4 5 6 7 8 9
a b a c a a b a c c

P: a b a c a b
0 1 2 3 4 5

 

last(a) is the index of the last (rightmost) occurrence of 'a' in P, which is 4.

last(c) is the index of the last occurrence of c in P, which is 3

'd' does not exist in the pattern there we have last (d) = -1.

 

c a b c d
last(c) 4   3 -1

 

Now, for 'b' notice

 

T: a b a c a a b a c c
P: a b a c a b        

 

Therefore, last(b) is the index of last occurrence of b in P, which is 5

 

The complete last(c) function

c a b c d
last(c) 4 5 3 -1

 

 

Boyer-Moore algorithm

 

BOYER_MOORE_MATCHER (T, P)

Input:    Text with n characters and Pattern with m characters
Output: Index of the first substring of T matching P

  1. Compute function last
  2. i m-1
  3. j m-1
  4. Repeat
  5.     If P[j] = T[i] then
  6.         if j=0 then
  7.             return i        // we have a match
  8.         else
  9.             i i -1
  10.             j j -1
  11.     else
  12.         i i + m - Min(j, 1 + last[T[i]])
  13.         j m -1
  14. until i > n -1
  15. Return "no match"

 

Analysis

The computation of the last function takes O(m+|∑|) time and actual search takes O(mn) time. Therefore the worst case running time of Boyer-Moore algorithm is O(nm + |∑|). Implies that the worst-case running time is quadratic, in case of n = m, the same as the naïve algorithm.