0% found this document useful (0 votes)

10 views6 pages

Draft 1

The document summarizes the Knuth-Morris-Pratt (KMP) string matching algorithm. It begins by defining the prefix function and providing a trivial O(n3) algorithm. It then describes two optimizations that improve the algorithm's complexity to O(n). The first optimization notes the prefix function can only increase by 1 between positions. The second optimization avoids string comparisons by using information from previously computed prefix values. The final O(n) algorithm is presented and implemented in pseudocode.

Uploaded by

justkrishnav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views6 pages

Draft 1

Uploaded by

justkrishnav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Prefix function.

Knuth-Morris-Pratt algorithm

M.Sc seminar report on

String Matching Algorithm
by
Krishnamoorthi V
(222123030)

DEPARTMENT OF MATHEMATICS
INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI
GUWAHATI - 781039, ASSAM

1
1 Prefix function definition
You are given a string s of length n. The prefix function for this string is
defined as an array π of length n, where π[i] is the length of the longest proper
prefix of the substring s[0 . . . i] which is also a suffix of this substring. A proper
prefix of a string is a prefix that is not equal to the string itself. By definition,
π[0] = 0.
Mathematically the definition of the prefix function can be written as follows:

π[i] = max {k : s[0 . . . k − 1] = s[i − (k − 1) . . . i]}

k=0...i

For example, prefix function of string ”abcabcd” is [0, 0, 0, 1, 2, 3, 0], and

prefix function of string ”aabaaab” is [0, 1, 0, 1, 2, 2, 3].

2 Trivial Algorithm
An algorithm which follows the definition of prefix function exactly is the fol-
lowing:

Listing 1: Trivial Algorithm

vector < int > pr e fi x_ fu n ct io n ( string s ) {

int n = ( int ) s . length () ;
vector < int > pi ( n ) ;
for ( int i = 0; i < n ; i ++)
for ( int k = 0; k <= i ; k ++)
if ( s . substr (0 , k ) == s . substr (i - k +1 , k ) )
pi [ i ] = k ;
return pi ;
}

It is easy to see that its complexity is O(n3 ), which has room for improve-
ment. Certainly! Here is the provided text formatted in LaTeX:
“‘latex

3 The need to optimize the O(n3 ) algorithm for

string matching
The need to optimize the O(n3 ) algorithm for string matching arises from its
inefficiency, especially when dealing with large datasets. Optimizing the O(n3 )
algorithm for string matching is crucial for enabling efficient and practical pro-
cessing of large datasets, meeting resource constraints, excelling in competitive
programming, supporting real-time systems, ensuring scalability, and advancing
scientific research in various domains.

2
4 Efficient Algorithm
This algorithm was proposed by Knuth and Pratt and independently from them
by Morris in 1977. It was used as the main function of a substring search
algorithm.

4.1 First optimization

The first important observation is that the values of the prefix function can only
increase by at most one.
Indeed, otherwise, if π[i + 1] > π[i] + 1, then we can take this suffix ending
in position i + 1 with the length π[i + 1] and remove the last character from it.
We end up with a suffix ending in position i with the length π[i + 1] − 1, which
is better than π[i], i.e. we get a contradiction.
The following illustration shows this contradiction. The longest proper suffix
at position i that also is a prefix is of length 2, and at position i + 1 it is of
length 4. Therefore the string s0 s1 s2 s3 is equal to the string si−2 si−1 si si+1 ,
which means that also the strings s0 s1 s2 and si−2 si−1 si are equal, therefore
π[i] has to be 3.
π[i]=2 π[i]=2
z }| { z }| {
s s s s . . . si−2 si−1 si si+1
| 0 1{z 2 }3 | {z }
π[i+1]=4 π[i+1]=4

Thus when moving to the next position, the value of the prefix function can
either increase by one, stay the same, or decrease by some amount. This fact
already allows us to reduce the complexity of the algorithm to O(n2 ), because
in one step the prefix function can grow at most by one. In total the function
can grow at most n steps, and therefore also only can decrease a total of n steps.
This means we only have to perform O(n) string comparisons, and reach the
complexity O(n2 ).

4.2 Second optimization

Let’s go further, we want to get rid of the string comparisons. To accomplish
this, we have to use all the information computed in the previous steps.
So let us compute the value of the prefix function π for i + 1. If s[i + 1] =
s[π[i]], then we can say with certainty that π[i + 1] = π[i] + 1, since we already
know that the suffix at position i of length π[i] is equal to the prefix of length
π[i]. This is illustrated again with an example.
π[i] s3 =si+1 π[i]
z }| { z}|{ z }| {
s s s s3 . . . si−2 si−1 si−1 si si+1
|0 1 2{z } | {z }
π[i+1]=π[i]+1 j
| {z }
If this is not the case, s[i + 1] ̸= s[π[i]], then we need to try a shorter
string. In order to speed things up, we would like to immediately move to the

3
longest length j < π[i], such that the prefix property in the position i holds, i.e.
s[0 . . . j − 1] = s[i − j + 1 . . . i]:
π[i] π[i]
z }| { z }| {
s s s s . . . si−3 si−2 si−1 si si+1
|0{z }1 2 3 | {z }
j j

Indeed, if we find such a length j, then we again only need to compare the
characters s[i + 1] and s[j]. If they are equal, then we can assign π[i + 1] = j + 1.
Otherwise we will need to find the largest value smaller than j, for which the
prefix property holds, and so on. It can happen that this goes until j = 0. If
then s[i + 1] = s[0], we assign π[i + 1] = 1, and π[i + 1] = 0 otherwise.
So we already have a general scheme of the algorithm. The only question
left is how do we effectively find the lengths for j.
Let’s recap: for the current length j at the position i for which the prefix
property holds, i.e. s[0 . . . j − 1] = s[i − j + 1 . . . i], we want to find the greatest
kj, for which the prefix property holds.

s s s s . . . si−3 si−2 si−1 si si+1

|0{z }1 2 3 | {z }
k k

The illustration shows that this has to be the value of π[j − 1], which we
already calculated earlier.

4.3 Final algorithm

So we finally can build an algorithm that doesn’t perform any string comparisons
and only performs O(n) actions.
Here is the final procedure:

• We compute the prefix values π[i] in a loop by iterating from i = 1 to

i = n − 1 (π[0] just gets assigned with 0).

• To calculate the current value π[i] we set the variable j denoting the length
of the best suffix for i − 1. Initially j = π[i − 1].
• Test if the suffix of length j + 1 is also a prefix by comparing s[j] and s[i].
If they are equal then we assign π[i] = j + 1, otherwise we reduce j to
π[j − 1] and repeat this step.
• If we have reached the length j = 0 and still don’t have a match, then we
assign π[i] = 0 and go to the next index i + 1.

4.4 Implementation
The implementation ends up being surprisingly short and expressive.

4
Listing 2: Implementation

vector < int > pr e fi x_ fu n ct io n ( string s ) {

int n = ( int ) s . length () ;
vector < int > pi ( n ) ;
for ( int i = 1; i < n ; i ++) {
int j = pi [i -1];
while ( j > 0 && s [ i ] != s [ j ])
j = pi [j -1];
if ( s [ i ] == s [ j ])
j ++;
pi [ i ] = j ;
}
return pi ;
}

This algorithm runs in O(n) time, which is optimal for this problem.

5
5 References
5.1 GeeksforGeeks
https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/

5.2 Javat.Point.com
https://www.javatpoint.com/daa-knuth-morris-pratt-algorithm

5.3 Wikipedia - Knuth-Morris-Pratt Algorithm

www.wikipedia.com/KMPalgo

5.4 Introduction to Algorithms Book

Introduction to Algorithms is a book on computer programming by Thomas H.
Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.

Lec 10
No ratings yet
Lec 10
36 pages
54.string Inotes
No ratings yet
54.string Inotes
20 pages
UNIT-V String Matching
No ratings yet
UNIT-V String Matching
24 pages
String Matching
No ratings yet
String Matching
16 pages
DSA - Strings - Notes
No ratings yet
DSA - Strings - Notes
8 pages
11 Data Structures and Algorithms - Narasimha Karumanchi
No ratings yet
11 Data Structures and Algorithms - Narasimha Karumanchi
12 pages
Java Syntax Notes
No ratings yet
Java Syntax Notes
27 pages
12 - Strings Matching
No ratings yet
12 - Strings Matching
111 pages
54.string 2notes
No ratings yet
54.string 2notes
20 pages
Machine Learning Report Official
No ratings yet
Machine Learning Report Official
17 pages
Unit 1: Linear Programming L.P. Problems: Components
No ratings yet
Unit 1: Linear Programming L.P. Problems: Components
21 pages
SSD Single Shot MultiBox Detector
No ratings yet
SSD Single Shot MultiBox Detector
10 pages
Design & Analysis of Algorithm - 6
No ratings yet
Design & Analysis of Algorithm - 6
32 pages
Z Function and Its Calculation:: Int Int Int Int For Int If While If
No ratings yet
Z Function and Its Calculation:: Int Int Int Int For Int If While If
32 pages
Graphs of Cubic Functions
No ratings yet
Graphs of Cubic Functions
26 pages
W 9 Presentation
No ratings yet
W 9 Presentation
20 pages
Final Quiz 2 - Attempt Review
No ratings yet
Final Quiz 2 - Attempt Review
6 pages
DAA Questions and Objective
No ratings yet
DAA Questions and Objective
11 pages
Unit 3
No ratings yet
Unit 3
34 pages
KMP Algorithm
No ratings yet
KMP Algorithm
21 pages
4 Module Algorithms
No ratings yet
4 Module Algorithms
28 pages
Icsis 2022 Face Morphing
No ratings yet
Icsis 2022 Face Morphing
10 pages
Module 3
No ratings yet
Module 3
31 pages
Lecture6C RouthHurwitzCriterion
No ratings yet
Lecture6C RouthHurwitzCriterion
19 pages
Experiment 9 DAA
No ratings yet
Experiment 9 DAA
5 pages
20BCS5977 - DAA LAB WORKSHEET 3.3pdf
No ratings yet
20BCS5977 - DAA LAB WORKSHEET 3.3pdf
5 pages
Week4 PPT SM
No ratings yet
Week4 PPT SM
35 pages
Lecture 34, 35 36 - String Matching Algorithms
No ratings yet
Lecture 34, 35 36 - String Matching Algorithms
42 pages
String Matching and Hashing
No ratings yet
String Matching and Hashing
10 pages
Hem Bs-205a
No ratings yet
Hem Bs-205a
3 pages
BNP Unit-5 Lecture 20 KMP 5.2
No ratings yet
BNP Unit-5 Lecture 20 KMP 5.2
14 pages
String Matching - RYS - Lect - 1 - 2 - 3 - Update
No ratings yet
String Matching - RYS - Lect - 1 - 2 - 3 - Update
61 pages
Advanced String Lecture
No ratings yet
Advanced String Lecture
50 pages
A Nonlocal Bayesian Image Denoising Algorithm: M. Lebrun, A. Buades, J. M. Morel
No ratings yet
A Nonlocal Bayesian Image Denoising Algorithm: M. Lebrun, A. Buades, J. M. Morel
24 pages
Foundations of Sequence Analysis
No ratings yet
Foundations of Sequence Analysis
161 pages
Patternmatching
No ratings yet
Patternmatching
29 pages
String Matching
100% (1)
String Matching
27 pages
Lecture#8 - String Matching Algorithm
No ratings yet
Lecture#8 - String Matching Algorithm
38 pages
DAA DA Output
No ratings yet
DAA DA Output
9 pages
CSE 205 Lab Manual 12 KMP
No ratings yet
CSE 205 Lab Manual 12 KMP
6 pages
Topology Optimization of Structures and Continua - Computational Aspects and Background
No ratings yet
Topology Optimization of Structures and Continua - Computational Aspects and Background
3 pages
AAD Lec11
No ratings yet
AAD Lec11
5 pages
Data Structures UNIT-1: Recursion: Introduction, Format of Recursive Functions, Recursion vs. Iteration, Examples
No ratings yet
Data Structures UNIT-1: Recursion: Introduction, Format of Recursive Functions, Recursion vs. Iteration, Examples
7 pages
Lecture 56string Matching
No ratings yet
Lecture 56string Matching
43 pages
Day 1.2 GCF Common Monomial Factor
No ratings yet
Day 1.2 GCF Common Monomial Factor
19 pages
Sunil Sharma PHD Proposal
No ratings yet
Sunil Sharma PHD Proposal
6 pages
The Knuth Morris Pratt Algorithm
No ratings yet
The Knuth Morris Pratt Algorithm
7 pages
AOA Module 6 - String of Algorithms - Aeraxia - in
No ratings yet
AOA Module 6 - String of Algorithms - Aeraxia - in
26 pages
DAA Unit 5
No ratings yet
DAA Unit 5
22 pages
String Matching
No ratings yet
String Matching
27 pages
Boyer Moore Algorithm: Idan Szpektor
100% (1)
Boyer Moore Algorithm: Idan Szpektor
48 pages
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
No ratings yet
Today's Lecture: String Matching Algorithm Naïve / Brute Force RK
20 pages
Ece Vii DSP Algorithms Architecture 10ec751 Notes
No ratings yet
Ece Vii DSP Algorithms Architecture 10ec751 Notes
181 pages
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
No ratings yet
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
18 pages
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
No ratings yet
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
44 pages
Suffix Arrays: Justin Zhang 24 May 2017
No ratings yet
Suffix Arrays: Justin Zhang 24 May 2017
5 pages
Ada Notes Unit 4
No ratings yet
Ada Notes Unit 4
28 pages
Identification of Defects in Fruits Usin
No ratings yet
Identification of Defects in Fruits Usin
4 pages
Unit 5 String Matching 2010
No ratings yet
Unit 5 String Matching 2010
5 pages
KCS301 (DST) Blow Up
No ratings yet
KCS301 (DST) Blow Up
12 pages
String Matching Algorithms: Antonio Carzaniga
No ratings yet
String Matching Algorithms: Antonio Carzaniga
11 pages
Data Structures - 30052014 - 112017PM
No ratings yet
Data Structures - 30052014 - 112017PM
2 pages
Knuth Moris 2797348
No ratings yet
Knuth Moris 2797348
21 pages
10 String Algorithms
No ratings yet
10 String Algorithms
36 pages
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
No ratings yet
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
18 pages
Algorithms in Bioinformatics
No ratings yet
Algorithms in Bioinformatics
7 pages
KMP 2
No ratings yet
KMP 2
7 pages
Coe4Tn3 Image Processing: Wavelet and Multiresolution Wavelet and Multiresolution Processing
No ratings yet
Coe4Tn3 Image Processing: Wavelet and Multiresolution Wavelet and Multiresolution Processing
9 pages
A357460420 - 22393 - 2 - 2018 - String Matching
No ratings yet
A357460420 - 22393 - 2 - 2018 - String Matching
27 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
46 pages
Solution Notes
No ratings yet
Solution Notes
3 pages
Name:Chaitanya Santosh Mhetre. Roll No (24) .: Assignment No.14: Implement Scheduling Algorithms
No ratings yet
Name:Chaitanya Santosh Mhetre. Roll No (24) .: Assignment No.14: Implement Scheduling Algorithms
2 pages
W9 Presentation
No ratings yet
W9 Presentation
20 pages
Toc
No ratings yet
Toc
6 pages
DAA 2020 Week 06 Assignment 02
0% (1)
DAA 2020 Week 06 Assignment 02
6 pages
1 An Example of The Dual Simplex Method
No ratings yet
1 An Example of The Dual Simplex Method
5 pages
TE - 2019 - (AIML) Artificial Intelligence and Machine Learning
No ratings yet
TE - 2019 - (AIML) Artificial Intelligence and Machine Learning
4 pages
Error Probability Performance For BPSK Using Viterbi Algorithm
No ratings yet
Error Probability Performance For BPSK Using Viterbi Algorithm
10 pages
CS3401 Algorithms Lecture Notes 1
No ratings yet
CS3401 Algorithms Lecture Notes 1
132 pages
CS 240 Tutorial 11 Notes: C A A B A
No ratings yet
CS 240 Tutorial 11 Notes: C A A B A
2 pages
Application of A Modified Convolution Method To Exact String Matching
No ratings yet
Application of A Modified Convolution Method To Exact String Matching
6 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
25 pages
Signals and Systems
No ratings yet
Signals and Systems
22 pages
Introduction To Management Science: by Bernard W. Taylor III
No ratings yet
Introduction To Management Science: by Bernard W. Taylor III
50 pages
Abstract
No ratings yet
Abstract
12 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Permutation and Combinations
From Everand
Permutation and Combinations
Ramesh Chandra
4/5 (36)
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet

Draft 1

Uploaded by

Draft 1

Uploaded by

Prefix function.

M.Sc seminar report on

π[i] = max {k : s[0 . . . k − 1] = s[i − (k − 1) . . . i]}

For example, prefix function of string ”abcabcd” is [0, 0, 0, 1, 2, 3, 0], and

Listing 1: Trivial Algorithm

vector < int > pr e fi x_ fu n ct io n ( string s ) {

3 The need to optimize the O(n3 ) algorithm for

4.1 First optimization

4.2 Second optimization

s s s s . . . si−3 si−2 si−1 si si+1

4.3 Final algorithm

• We compute the prefix values π[i] in a loop by iterating from i = 1 to

vector < int > pr e fi x_ fu n ct io n ( string s ) {

5.3 Wikipedia - Knuth-Morris-Pratt Algorithm

5.4 Introduction to Algorithms Book

You might also like