|

Na´ve String Matching

 

The na´ve approach simply test all the possible placement of Pattern P[1 . . m] relative to text T[1 . . n]. Specifically, we try shift s = 0, 1, . . . , n - m, successively and for each shift, s. Compare T[s +1 . . s + m] to P[1 . . m]

 

NA¤VE_STRING_MATCHER (T, P)

  1. n length [T]
  2. m length [P]
  3. for s 0 to n - m do
  4.     if P[1 . . m] = T[s +1 . . s + m]
  5.         then return valid shift s

 

The na´ve string-matching procedure can be interpreted graphically as a sliding a pattern P[1 . . m] over the text T[1 . . n] and noting for which shift all of the characters in the pattern match the corresponding characters in the text.

In other to analysis the time of na´ve matching, we would like to implement above algorithm to understand the test involves in line 4.

Note that in this implementation, we use notation P[1 . . j] to denote the substring of P from index i to index j.
That is,
P[1 . . j] = P[i] P[i +1] . . . P[j].

 

NA¤VE_STRING_MATCHER (T, P)

  1. n length [T]
  2. m length [P]
  3. for s 0 to n-m do
  4.     j 1
  5.    while j m and T[s + j] = P[j] do
  6.         j j +1
  7.     If j > m then
  8.         return valid shift s
  9. return no valid shift exist     // i.e., there is no substring of T matching P.

 

Referring to implementation of na´ve matcher, we see that the for-loop in line 3 is executed at most n - m +1 times, and the while-loop in line 5 is executed at most m times. Therefore, the running time of the algorithm is O((n - m +1)m), which is clearly O(nm). Hence, in the worst case, when the length of the pattern, m are roughly equal, this algorithm runs in the quadratic time.

One worst case is that text, T, has n number of A's and the pattern, P, has (m -1) number of A's followed by a single B.