KMP Algo
KMP Algo
ALGORITHM
PREPARED BY:
ADITYA AWASTHI
THE PROBLEM OF STRING MATCHING
Compute-Prefix-Function (p)
1 m length[p] //’p’ pattern to be matched
2 Π[1] 0
3 k0
4 for q 2 to m
5 do while k > 0 and p[k+1] != p[q]
6 do k Π[k]
7 If p[k+1] = p[q]
8 then k k +1
9 Π[q] k
10 return Π
Example: compute Π for the pattern ‘p’ below:
p a b a b a c a
Initially: m = length[p] = 7
Π[1] = 0
k=0
q 1 2 3 4 5 6 7
Step 1: q = 2, k=0 p a b a b a c a
Π 0 0
Π[2] = 0
q 1 2 3 4 5 6 7
p a b a b a c a
Step 2: q = 3, k = 0,
Π[3] = 1 Π 0 0 1
q 1 2 3 4 5 6 7
Step 3: q = 4, k = 1 p a b a b a c A
Π[4] = 2 Π 0 0 1 2
Step 4: q = 5, k =2 q 1 2 3 4 5 6 7
Π[5] = 3 p a b a b a c a
Π 0 0 1 2 3
q 1 2 3 4 5 6 7
Step 5: q = 6, k = 3
p a b a b a c a
Π[6] = 1
Π 0 0 1 2 3 1
q 1 2 3 4 5 6 7
Step 6: q = 7, k = 1 p a b a b a c a
Π[7] = 1 Π 0 0 1 2 3 1 1
q 1 2 3 4 5 6 7
After iterating 6 times, the prefix
function computation is complete: p a b A b a c a
Π 0 0 1 2 3 1 1
THE KMP MATCHER
The KMP Matcher, with pattern ‘p’, string ‘S’ and prefix function ‘Π’ as input, finds a match of p in S.
Following pseudocode computes the matching component of KMP algorithm:
KMP-Matcher(S,p)
1 n length[S]
2 m length[p]
3 Π Compute-Prefix-Function(p)
4q0 //number of characters matched
5 for i 1 to n //scan S from left to right
6 do while q > 0 and p[q+1] != S[i]
7 do q Π[q] //next character does not match
8 if p[q+1] = S[i]
9 then q q + 1 //next character matches
10 if q = m //is all of p matched?
11 then print “Pattern occurs with shift” i – m
12 q Π[ q] // look for the next match
Note: KMP finds every occurrence of a ‘p’ in ‘S’. That is why KMP does not terminate in step 12, rather it searches
remainder of ‘S’ for any more occurrences of ‘p’.
Illustration: given a String ‘S’ and pattern ‘p’ as
follows:
S b a c b a b a b a b a c a c a
p a b a b a c a
Let us execute the KMP algorithm to find
whether ‘p’ occurs in ‘S’.
For ‘p’ the prefix function, Π was computed previously and is as follows:
q 1 2 3 4 5 6 7
p a b A b a c a
Π 0 0 1 2 3 1 1
Initially: n = size of S = 15;
m = size of p = 7
Step 1: i = 1, q = 0
comparing p[1] with S[1]
S b a c b a b a b a b a c a a b
p a b a b a c a
P[1] does not match with S[1]. ‘p’ will be shifted one position to the right.
Step 2: i = 2, q = 0
comparing p[1] with S[2]
S b a c b a b a b a b a c a a b
p a b a b a c a
P[1] matches S[2]. Since there is a match, p is not shifted.
Step 3: i = 3, q = 1
Comparing p[2] with S[3] p[2] does not match with S[3]
S b a c b a b a b a b a c a a b
p a b a b a c a
Backtracking on p, comparing p[1] and S[3]
Step 4: i = 4, q = 0
comparing p[1] with S[4] p[1] does not match with S[4]
S b a c b a b a b a b a c a a b
p a b a b a c a
Step 5: i = 5, q = 0
comparing p[1] with S[5] p[1] matches with S[5]
S b a c b a b a b a b a c a a b
p a b a b a c a
Step 6: i = 6, q = 1
Comparing p[2] with S[6] p[2] matches with S[6]
S b a c b a b a b a b a c a a b
p a b a b a c a
Step 7: i = 7, q = 2
Comparing p[3] with S[7] p[3] matches with S[7]
S b a c b a b a b a b a c a a b
p a b a b a c a
Step 8: i = 8, q = 3
Comparing p[4] with S[8] p[4] matches with S[8]
S b a c b a b a b a b a c a a b
p a b a b a c a
Step 9: i = 9, q = 4
Comparing p[5] with S[9] p[5] matches with S[9]
S b a c b a b a b a b a c a a b
p a b a b a c a
S b a c b a b a b a b a c a a b
p a b a b a c a
Backtracking on p, comparing p[4] with S[10] because after mismatch q = Π[5] = 3
S b a c b a b a b a b a c a a b
p a b a b a c a
Step 12: i = 12, q = 5
Comparing p[6] with S[12] p[6] matches with S[12]
S b a c b a b a b a b a c a a b
p a b a b a c a
S b a c b a b a b a b a c a a b
p a b a b a c a
Pattern ‘p’ has been found to completely occur in string ‘S’. The total number of shifts
that took place for the match to be found are: i – m = 13 – 7 = 6 shifts.
RUNNING - TIME ANALYSIS
Compute-Prefix-Function (Π) KMP Matcher
1 m length[p] //’p’ pattern to be matched 1 n length[S]
2 Π[1] 0 2 m length[p]
3 k0 3 Π Compute-Prefix-Function(p)
4 for q 2 to m 4q0
5 do while k > 0 and p[k+1] != p[q] 5 for i 1 to n
6 do k Π[k] 6 do while q > 0 and p[q+1] != S[i]
7 If p[k+1] = p[q] 7 do q Π[q]
8 then k k +1 8 if p[q+1] = S[i]
9 Π[q] k 9 then q q + 1
10 return Π 10 if q = m
11 then print “Pattern occurs with shift” i – m
12 q Π[ q]
In the above pseudocode for The for loop beginning in step 5 runs ‘n’ times,
computing the prefix i.e., as long as the length of the string ‘S’.
function, the for loop from Since step 1 to step 4 take constant time, the
step 4 to step 10 runs ‘m’ running time is dominated by this for loop.
times. Step 1 to step 3 take Thus running time of matching function is
constant time. Hence the Θ(n).
running time of compute
prefix function is Θ(m).