Draft 1
Draft 1
Knuth-Morris-Pratt algorithm
DEPARTMENT OF MATHEMATICS
INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI
GUWAHATI - 781039, ASSAM
1
1 Prefix function definition
You are given a string s of length n. The prefix function for this string is
defined as an array π of length n, where π[i] is the length of the longest proper
prefix of the substring s[0 . . . i] which is also a suffix of this substring. A proper
prefix of a string is a prefix that is not equal to the string itself. By definition,
π[0] = 0.
Mathematically the definition of the prefix function can be written as follows:
2 Trivial Algorithm
An algorithm which follows the definition of prefix function exactly is the fol-
lowing:
It is easy to see that its complexity is O(n3 ), which has room for improve-
ment. Certainly! Here is the provided text formatted in LaTeX:
“‘latex
2
4 Efficient Algorithm
This algorithm was proposed by Knuth and Pratt and independently from them
by Morris in 1977. It was used as the main function of a substring search
algorithm.
Thus when moving to the next position, the value of the prefix function can
either increase by one, stay the same, or decrease by some amount. This fact
already allows us to reduce the complexity of the algorithm to O(n2 ), because
in one step the prefix function can grow at most by one. In total the function
can grow at most n steps, and therefore also only can decrease a total of n steps.
This means we only have to perform O(n) string comparisons, and reach the
complexity O(n2 ).
3
longest length j < π[i], such that the prefix property in the position i holds, i.e.
s[0 . . . j − 1] = s[i − j + 1 . . . i]:
π[i] π[i]
z }| { z }| {
s s s s . . . si−3 si−2 si−1 si si+1
|0{z }1 2 3 | {z }
j j
Indeed, if we find such a length j, then we again only need to compare the
characters s[i + 1] and s[j]. If they are equal, then we can assign π[i + 1] = j + 1.
Otherwise we will need to find the largest value smaller than j, for which the
prefix property holds, and so on. It can happen that this goes until j = 0. If
then s[i + 1] = s[0], we assign π[i + 1] = 1, and π[i + 1] = 0 otherwise.
So we already have a general scheme of the algorithm. The only question
left is how do we effectively find the lengths for j.
Let’s recap: for the current length j at the position i for which the prefix
property holds, i.e. s[0 . . . j − 1] = s[i − j + 1 . . . i], we want to find the greatest
kj, for which the prefix property holds.
The illustration shows that this has to be the value of π[j − 1], which we
already calculated earlier.
• To calculate the current value π[i] we set the variable j denoting the length
of the best suffix for i − 1. Initially j = π[i − 1].
• Test if the suffix of length j + 1 is also a prefix by comparing s[j] and s[i].
If they are equal then we assign π[i] = j + 1, otherwise we reduce j to
π[j − 1] and repeat this step.
• If we have reached the length j = 0 and still don’t have a match, then we
assign π[i] = 0 and go to the next index i + 1.
4.4 Implementation
The implementation ends up being surprisingly short and expressive.
4
Listing 2: Implementation
This algorithm runs in O(n) time, which is optimal for this problem.
5
5 References
5.1 GeeksforGeeks
https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/
5.2 Javat.Point.com
https://www.javatpoint.com/daa-knuth-morris-pratt-algorithm