[go: up one dir, main page]

0% found this document useful (0 votes)
10 views6 pages

Draft 1

The document summarizes the Knuth-Morris-Pratt (KMP) string matching algorithm. It begins by defining the prefix function and providing a trivial O(n3) algorithm. It then describes two optimizations that improve the algorithm's complexity to O(n). The first optimization notes the prefix function can only increase by 1 between positions. The second optimization avoids string comparisons by using information from previously computed prefix values. The final O(n) algorithm is presented and implemented in pseudocode.

Uploaded by

justkrishnav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

Draft 1

The document summarizes the Knuth-Morris-Pratt (KMP) string matching algorithm. It begins by defining the prefix function and providing a trivial O(n3) algorithm. It then describes two optimizations that improve the algorithm's complexity to O(n). The first optimization notes the prefix function can only increase by 1 between positions. The second optimization avoids string comparisons by using information from previously computed prefix values. The final O(n) algorithm is presented and implemented in pseudocode.

Uploaded by

justkrishnav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Prefix function.

Knuth-Morris-Pratt algorithm

M.Sc seminar report on


String Matching Algorithm
by
Krishnamoorthi V
(222123030)

DEPARTMENT OF MATHEMATICS
INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI
GUWAHATI - 781039, ASSAM

1
1 Prefix function definition
You are given a string s of length n. The prefix function for this string is
defined as an array π of length n, where π[i] is the length of the longest proper
prefix of the substring s[0 . . . i] which is also a suffix of this substring. A proper
prefix of a string is a prefix that is not equal to the string itself. By definition,
π[0] = 0.
Mathematically the definition of the prefix function can be written as follows:

π[i] = max {k : s[0 . . . k − 1] = s[i − (k − 1) . . . i]}


k=0...i

For example, prefix function of string ”abcabcd” is [0, 0, 0, 1, 2, 3, 0], and


prefix function of string ”aabaaab” is [0, 1, 0, 1, 2, 2, 3].

2 Trivial Algorithm
An algorithm which follows the definition of prefix function exactly is the fol-
lowing:

Listing 1: Trivial Algorithm

vector < int > pr e fi x_ fu n ct io n ( string s ) {


int n = ( int ) s . length () ;
vector < int > pi ( n ) ;
for ( int i = 0; i < n ; i ++)
for ( int k = 0; k <= i ; k ++)
if ( s . substr (0 , k ) == s . substr (i - k +1 , k ) )
pi [ i ] = k ;
return pi ;
}

It is easy to see that its complexity is O(n3 ), which has room for improve-
ment. Certainly! Here is the provided text formatted in LaTeX:
“‘latex

3 The need to optimize the O(n3 ) algorithm for


string matching
The need to optimize the O(n3 ) algorithm for string matching arises from its
inefficiency, especially when dealing with large datasets. Optimizing the O(n3 )
algorithm for string matching is crucial for enabling efficient and practical pro-
cessing of large datasets, meeting resource constraints, excelling in competitive
programming, supporting real-time systems, ensuring scalability, and advancing
scientific research in various domains.

2
4 Efficient Algorithm
This algorithm was proposed by Knuth and Pratt and independently from them
by Morris in 1977. It was used as the main function of a substring search
algorithm.

4.1 First optimization


The first important observation is that the values of the prefix function can only
increase by at most one.
Indeed, otherwise, if π[i + 1] > π[i] + 1, then we can take this suffix ending
in position i + 1 with the length π[i + 1] and remove the last character from it.
We end up with a suffix ending in position i with the length π[i + 1] − 1, which
is better than π[i], i.e. we get a contradiction.
The following illustration shows this contradiction. The longest proper suffix
at position i that also is a prefix is of length 2, and at position i + 1 it is of
length 4. Therefore the string s0 s1 s2 s3 is equal to the string si−2 si−1 si si+1 ,
which means that also the strings s0 s1 s2 and si−2 si−1 si are equal, therefore
π[i] has to be 3.
π[i]=2 π[i]=2
z }| { z }| {
s s s s . . . si−2 si−1 si si+1
| 0 1{z 2 }3 | {z }
π[i+1]=4 π[i+1]=4

Thus when moving to the next position, the value of the prefix function can
either increase by one, stay the same, or decrease by some amount. This fact
already allows us to reduce the complexity of the algorithm to O(n2 ), because
in one step the prefix function can grow at most by one. In total the function
can grow at most n steps, and therefore also only can decrease a total of n steps.
This means we only have to perform O(n) string comparisons, and reach the
complexity O(n2 ).

4.2 Second optimization


Let’s go further, we want to get rid of the string comparisons. To accomplish
this, we have to use all the information computed in the previous steps.
So let us compute the value of the prefix function π for i + 1. If s[i + 1] =
s[π[i]], then we can say with certainty that π[i + 1] = π[i] + 1, since we already
know that the suffix at position i of length π[i] is equal to the prefix of length
π[i]. This is illustrated again with an example.
π[i] s3 =si+1 π[i]
z }| { z}|{ z }| {
s s s s3 . . . si−2 si−1 si−1 si si+1
|0 1 2{z } | {z }
π[i+1]=π[i]+1 j
| {z }
If this is not the case, s[i + 1] ̸= s[π[i]], then we need to try a shorter
string. In order to speed things up, we would like to immediately move to the

3
longest length j < π[i], such that the prefix property in the position i holds, i.e.
s[0 . . . j − 1] = s[i − j + 1 . . . i]:
π[i] π[i]
z }| { z }| {
s s s s . . . si−3 si−2 si−1 si si+1
|0{z }1 2 3 | {z }
j j

Indeed, if we find such a length j, then we again only need to compare the
characters s[i + 1] and s[j]. If they are equal, then we can assign π[i + 1] = j + 1.
Otherwise we will need to find the largest value smaller than j, for which the
prefix property holds, and so on. It can happen that this goes until j = 0. If
then s[i + 1] = s[0], we assign π[i + 1] = 1, and π[i + 1] = 0 otherwise.
So we already have a general scheme of the algorithm. The only question
left is how do we effectively find the lengths for j.
Let’s recap: for the current length j at the position i for which the prefix
property holds, i.e. s[0 . . . j − 1] = s[i − j + 1 . . . i], we want to find the greatest
kj, for which the prefix property holds.

s s s s . . . si−3 si−2 si−1 si si+1


|0{z }1 2 3 | {z }
k k

The illustration shows that this has to be the value of π[j − 1], which we
already calculated earlier.

4.3 Final algorithm


So we finally can build an algorithm that doesn’t perform any string comparisons
and only performs O(n) actions.
Here is the final procedure:

• We compute the prefix values π[i] in a loop by iterating from i = 1 to


i = n − 1 (π[0] just gets assigned with 0).

• To calculate the current value π[i] we set the variable j denoting the length
of the best suffix for i − 1. Initially j = π[i − 1].
• Test if the suffix of length j + 1 is also a prefix by comparing s[j] and s[i].
If they are equal then we assign π[i] = j + 1, otherwise we reduce j to
π[j − 1] and repeat this step.
• If we have reached the length j = 0 and still don’t have a match, then we
assign π[i] = 0 and go to the next index i + 1.

4.4 Implementation
The implementation ends up being surprisingly short and expressive.

4
Listing 2: Implementation

vector < int > pr e fi x_ fu n ct io n ( string s ) {


int n = ( int ) s . length () ;
vector < int > pi ( n ) ;
for ( int i = 1; i < n ; i ++) {
int j = pi [i -1];
while ( j > 0 && s [ i ] != s [ j ])
j = pi [j -1];
if ( s [ i ] == s [ j ])
j ++;
pi [ i ] = j ;
}
return pi ;
}

This algorithm runs in O(n) time, which is optimal for this problem.

5
5 References
5.1 GeeksforGeeks
https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/

5.2 Javat.Point.com
https://www.javatpoint.com/daa-knuth-morris-pratt-algorithm

5.3 Wikipedia - Knuth-Morris-Pratt Algorithm


www.wikipedia.com/KMPalgo

5.4 Introduction to Algorithms Book


Introduction to Algorithms is a book on computer programming by Thomas H.
Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.

You might also like