The Boyer-Moore algorithm is consider the most efficient string-matching algorithm in usual applications, for example, in text editors and commands substitutions. The reason is that it woks the fastest when the alphabet is moderately sized and the pattern is relatively long.

The algorithm scans the characters of the pattern from right to left beginning with the rightmost character. During the testing of a possible placement of pattern P against text T, a mismatch of text character T[i] = c with the corresponding pattern character P[j] is handled as follows: If c is not contained anywhere in P, then shift the pattern P completely past T[i]. Otherwise, shift P until an occurrence of character c in P gets aligned with T[i].

This technique likely to avoid lots of needless comparisons by significantly shifting pattern relative to text.

**Last Function**

We define a function last(c) that takes a character c from the alphabet and specifies how far may shift the pattern P if a character equal to c is found in the text that does not match the pattern

ì index of the last occurrence if c is in P last(c) = í of c in pattern P î -1 otherwise

For example consider

T: 0 1 2 3 4 5 6 7 8 9 a b a c a a b a c c

P: a b a c a b 0 1 2 3 4 5

last(a) is the index of the last (rightmost) occurrence of 'a' in P, which is 4.

last(c) is the index of the last occurrence of c in P, which is 3

'd' does not exist in the pattern there we have last (d) = -1.

c a b c d last(c) 4 3 -1

Now, for 'b' notice

T: a b a c a a b a c c P: a b a c a b

Therefore, last(b) is the index of last occurrence of b in P, which is 5

The complete last(c) function

c a b c d last(c) 4 5 3 -1

**Boyer-Moore algorithm
**

BOYER_MOORE_MATCHER (T, P)

Input:Text with n characters and Pattern with m charactersOutput:Index of the first substring of T matching P

- Compute function last
- i ← m-1
- j ← m-1
- Repeat
- If P[j] = T[i] then
- if j=0 then
- return i // we have a match
- else
- i ← i -1
- j ← j -1
- else
- i ← i + m - Min(j, 1 + last[T[i]])
- j ← m -1
- until i > n -1
- Return "no match"

The computation of the last function takes
O(*m*+|∑|) time and
actual search takes O(*mn*) time. Therefore the worst case running
time of Boyer-Moore algorithm is O(*nm* + |∑|). Implies that the
worst-case running time is quadratic, in case of
n = m, the same as the naïve
algorithm.

- Boyer-Moore algorithm is extremely fast on large alphabet (relative to the length of the pattern).
- The payoff is not as for binary strings or for very short patterns.
- For binary strings Knuth-Morris-Pratt algorithm is recommended.
- For the very shortest patterns, the naïve algorithm may be better.