CHAPTER 6
String Matching
Introduction
sting matching operation is one of the basic parts in many text
processing applications. The main aim of string-matching algorithm
o find the pattern P from the given text T. Mos
pattern P is much smaller than the length
7, String matching problem is defined as
the length of the
the text T, that is, P<<
jiven some text or some
string T[1....n] of size n, find alll occurrences of pattern P{1....m] of
size, .
There are a few algorithms such as naive string-matching algorithms,
Knuth-Morris-Pratt(KMP) Algorithm, Rabin-Karp algorithm, and
80 on, which will be discussed with problems clearly.Pa NE St A OTe ee
150 Advanced Data Structures and Algorithms
SEL RE Avance Date Structures and Algrittnes on 10 Fist
Structure
In this chapter, we will discuss the following topics:
* The Naive String-Matching Algorithm
+ Rabin Karp Algorithm
* The Knuth-Morris-Pratt Algorithm
+ Longest Common Subsequence
String Matching 151
Figure 6.1: T(1} compared with P{1]
+ Genetic Algorithms Ss
Objectives siep 2 TEI-PPL PI
refer to Figure 6
By the end of this chapter, the reader will learn about various string.
matching algorithms with suitable examples. Readers will also learn 4 5 6 7 8 9 WW MW 12
about genetic algorithms. EE PeeEP PE i
The Naive String-Matching ef
Algorithm peel)
3 4
.
In this algorithm, it first compares the first character of patter r 2
with searchable text. If a match is not found, pointers of both the Figure 6.2: TI2] compared with PC1)
strings are moved forward. If a match is found, the pointer of text
is incremented and the pointer of pattern is reset. This process is step 3: TI3}= PIS], t, ++, Pyt+
repeated till the end of the text. Naive approach never requires any :
Preprocessing. It starts directly comparing the pattern P and text T, Refer to Figure 6.3;
character by character. After each comparison, the pointer shifts the 12 30 4) SERGE TE 89
attern string one position to the right.
a ® aye Je Ja Ja fase Jaye
Example 1 {
12
[ee
Perform pattern matching using the naive algorithm. w]e Te Ta TF
String: abedadcabedf
12 3.4 S
Pattern: abed f
eee Figure 6.3: TI3] compared with PIS)
Solutio
t,and p,are indices of text and pattern respectively.
Step 3: T[1] =PI1], t, ++, p+aa Tae Se fa Le? te
152 Addoonced Data Structures and Algorithms
Step 4: Tl4}= Pld}, ¢ ++, p++
String Matching 153
Refer to Figure 6 Aig 32 £00 By Be 9 10 = AT 2:
tees a |e |@ fe b Je Ja yf
aye Je le [fF
iene) 4 5
Figure 6.7: TI3] compared with P(1]
Figure 64: TI4) compared with PI] 1
Tid] PED,
Step 5: 715] PIS), 4,44, pat step 8TH) *
Refer to Figure 6.5: pofer to Figure 64
12 3 4 5 6 7 8 9 wo B
mo Te | d fe fa |b ad [tf
Te a [a PE
b fe ]4 fi “J
1 2 3 4 5
Figure 65: T15] compared with P{5] igspg GEpTIE) exvapared with PEL
Step 6: T[2] « P[1], ++ Step 9: T[5] = PI1], t, + Pitt
Refer to Figure 6.6: Refer to Figure 6.9:
12 3 4 5 6 7 8 9 wo wn 2 1 2 3 4eess6 70 8 9 lod 2
[ b Je |d fa |d fe a
’ Je
al 2 3
Figure 6.6: T12] compared wi
PUYStep 10: T16] « PI2},, 4,44
Refer to Figure 6.10:
Pees Bu #25 gt
2 Ga Ge ee ee
i
© |@
Figure 6:13: TI8] compared with P(1]
ars
Figure 6.10: TI6] compared
oe qigl=Pl2) b+ Pt
Refer to Figure 6.11: sero igure 61H
4 5 6 7 8 9 lo Wo 12
Ste
tep 11: Tl6] = Pl], ++ se
zeus, *4) 3 48 9 on RB
Figure 6.14: TIS] compared with PI2}
Step 12: [7] P[t], 4 ++ sep THO}ePAL. 44 2+
Refer to Figure 6.12: Refer to Figure 6.15:
Ee Tenl#:
Figure 6.12: TI7] compared with P[1]
Figure 6,15: T{10] compared with PLA]SESE ae wae, eee
156 &_ Adeanced Data $1
. red Data Structures and Al
Igorithas
Step 16: TIM}=P[4
Refer to Figure 6.1
Figure 6.16: 71a] compared with Pls}
Step 17: T112}=PI5],, ++, p++
Figure 6.17: T112} compared
‘Thus, the string is matched.
Algorithm for Naive bayes string
matching
The algorithm for Naive bays string matching is as foll
NAIVE_STRING_MATCH(T,P)
for i=@ to isn-m do
if P[1.m] == T[i+1L+m]then
print “Match found”
Zerraaneee i
ad
gsc of (O(n) and the worst case is Ofmn*(n-m+1)).
in Karp Algorithm
matching algorithms is the R
‘of matching every element of t
the hash value to match the string
matches each character of that string,
is one of the most
-Karp algorithm.
ngs
resents values of the pattern A[1....m] of length m,
rep?
00 he
] eH
Pawn [A]a}s}e|alptalals le
ursvalue] o [i [ofofifolif2z[sfo ae
Table 6.2: LPS tableString Matching @ 163
i
2 3 4 5
Ei a FEL
at 3 [4 | ps .value : a a
Apt a Te Pe Tats Pa Pe pp Table 6.8: Compare ali] with alj+1]
inde i step 7:ali] + alj+1] : move j to index pointed by the position j is now
— ° z 2 3 a 5 present that is, move j to index 2
pattem 69:
LPS vs ts wofer to Table 6.9:
LPS value 5 A e D ms
~~ L 2 0 ‘i i
1 2 3 4 5 6 7 8 9 10
eal A B A B c A B A B D
i
index 0 1 2 3 4
tern A B A B f
9 10. ~
icp LPS value 0 0 1 2 0
Table 6.9: Compare afi] with alj+1]
. Step 8: afi] + alj+1] ; move j to index pointed by the position jis now
5 present that is move j to index 0.
~ Refer to. le 6.10:
1 3 4 6 7 8 7 10,
A A B A B A B DTable 6.10: Compare
halt)
s already at a[0] therefore increment
= =
B Dd
leo
ema Moe |Si0
Cl ates Tals pp
1 2 3 4
pattern A B A B
LPS value 0 oO 1
Table 6.11: Compare a
Step 10: ali] == ajj
Refer to Table 6.12:
i
1
A
0
Table 6.13: Compa
B
0
real
index
pattern
LPS value
c [ays Tats po
ey 10
Se ®
Pattern A B A B D
LPS value 0 0 1 2 0
Table 6.12: Compare.
Step 11: afi] ==
A] site j joe
Refer to Table 6.13:
index
Pattern,
LPS valueee Re
TOO BE Adounced Data Structures and Aloritis
Step 14: afi} == afjst) vs jo
Refer to Table 6.16:
LurSvalee | -9 5
Table 6.16: Compare afi] with alj+1]
PELE
Hence, in this way we can match the string using KMP algorithm,
Longest Common Subsequence
(LCs)
ICS problems consist of two sequences of which, we need to find out
the common longest subsequent present among them.
Though dynamic programming method in LCS follows a bottom:
"P approach, the matrix table gets filled from top-bottom. It is an
approach followed to solve the problem. In the matrix table, rows
are equal to the number of alphabets present in String 1 and columns
are equal to the number of alphabets in String 2. To start with the
algorithm, let us initialize the first row and first column of the
matrix (that is, index 0 of the matrix) values as zero. Next, compare
between the elements of string 1 and the elements of string 2.1f there
isa match, then add value 1 with the value of the previous diagonal
clement. If no match exists, then consider the maximum value out
of either previous row element or column element. The process
continues till all the columns and rows are filled. To find the longest
common subsequence, start tracing back from the longest value
towards diagonal clement and thus the results are obtained. The time
using dynamic prog
ram
E phabets pre
jot OFA denotes number of habs p
otten =r ites as number of alphabets
sO denotes 4 ersta
138 gan example for better understan
consider 4
ets
le 7 5:
fae of the strings given as follows
a
sind th
airing 12 5t0N€
syring 2:Jonsest
olution:
refer to Tbe 617+ 7
a a
[| | iro fo | 0 | 0 | 0
ofo}o}fofo}o}irts
polio aoe cons [trom [exon Weaerera
o fo Papa fs | tj a} 2
mee to fa he Leet 2 | 2 | 2
a | ata recion ae poe
o Ln
‘Therefore, LCS = one
Example 8
String 1: minom
String
nom
Table 6.17: LCS table
id LCS of the strings given as follows:ed Data Structures and Algorithms on
Refer to Table 6.18. gearch Space
i reserved in the search space.
ation of the individual 1s P p
o>1][2][s]4 he popula Pe represented by a vector of components with 2
och pesen Wt etements il resemble genes in certain ways. As
a o 0 0 0 0 0 frite OMEN ont or various genes make up each chromosome, Refer
esl
z o 1 1 1 1 1 ar re 6.18!
3 ofa [a [2st 2 [>>
om 1 1 2 3 5
a ce oe
Table 6.18: CS table
Therefore, LCS = mnom
Figure 6.18: Genetic Algo
Genetic Algorithms
Most evolutionary algorithms are adaptive heuristic search Fitness score
Fhe eather chperson receives fitness score that indicates how well-equipped
the concepts of natural selection and genetics. They are frequently ach pet 7 A eet
al ay the changing environment. TheGAs maint
mploye« er 7 ey are, tocompete in’ ging i
felted omnes pee tele a ar aon, along with the fitness score of each individual. The
i ir Redual having higher fitness score than others wil be given more
Geneti
algorithms are nothing but the natural selection of the
species which can survive and adapt to the changing environment
and could reproduce and go to the next generation. In other words,
ae nce to reproduce, as compared to others. The candidate with the
highest fitness score is chosen because they can unite with their mate
amet combine their chromosomes to make superior children. Since the
it can be termed as “survival of the fittest” among individuals of the population is stati, space must be made for newcomers. As a result,
consecutive generations, of the problem to be solved. ome people pass away and are replaced by newcomers, eventually
‘The following analogy servesas the foundation for geneticalgorithms: giving rise to a new generation, once the previous population had all
1. Population members fight for resources and mates. its chances to mate. Better genes were passed down from members of
the previous generation to each subsequent generation. As a result,
younger generations always have more effective “partial solutions”
than earlier ones, The population has converged once the progeny
produced show no discernible difference from progeny from earlier
populations. It is said that the algorithm has “converged” on a set of
possible solutions to the issue.
2. The successful (fittest) individuals then mate to have more
children, than other individuals.
3. Genes from the “fittest” parent spread throughout the
generation, which means that occasionally parents produce
offspring who are superior to both parents.
4. Asa result, each next generation becomes more environment-
friendly.170 18 Advanced Data st
—Aewiced Data Structures and Algoritios
Operators of
Let us now
genetic algorithms
See the diffe
come ni it ent operators of neti lgoridms:
erator: Th
Selection Op 1 goa isto favor the individu,
Seana hen onthe fitness scale, allowing them ote
ave wit genes to the following generation’
By cf OFerstOFThisis the union the chosen ope
he crestover sites are then determined at random, They
switched, producing an entirely nee neon”
Figure 6.19 Features the Crossover operation:
.
Ossi
emer |e
Figure 6.19: Crossover operation
Mutation Operator: The
Ty
g0al is to maintain genetic div
by randomly introducing genes into “offprin Pope
early convergence can be avoided, az shown fe eee
Ext cone wn in the following
Before Mutation
(Ter eel tr lays
~—
‘After Mutation
PCE
Figure 6.20: Mutation
ie...
——_
; at the Rabin-Karp
ve preceding algorithms, wwe can say that th
an the PCS KMP algorithm are the effective ones as compared to
he aM ne patterns are smal and the alphabets are small The
sem ste solation with the fowest execution fe
oil
pave a
Key Facts
rhe string-matching algorithms are used for finding the
+ The Bronce of a pattern string within a larger tox sting
ore algorithms can be used for a varity of applications,
Thetrding text processing, data compression, and pattern
recognition.
The naive string-matching algorithm is a simple algorithm,
+ The fas fime complexity of O(n * m). tis suitable for small
tat edium-sized strings, but it can be slow for larger strings
The Rabin-Karp algorithm is an improvement over the naive
Tuing-matching algorithm, The Rabin-Karp algorithm is @
sees tool for text processing, data compression, and pattern
recognition Ithas a time complexity of O(n +m).
The Knuth-Morris-Pratt (KMP) algorithm isan improvement
overthe naive string-matching algorithm and the Rabin-Karp
aigorthin. Te has a time complexity of O(n +m) and is more
‘efficient for large strings.
‘The Longest Common Subsequence (LCS) is a problem
in computer science and operations research,that involves
finding the longest subsequence that is common to two or
more strings. Itis an example of a problem that can be solved
using optimization techniques, which are used to find the
best solution to a problem given a set of constraints.
Genetic algorithms are a type of optimization algorithm
that is inspired by the process of natural evolution. Genetic
algorithms are used in a variety of applications, including
optimization, machine learning, and artificial intelligence.
They are a useful tool for finding good solutions to difficult
problems and are often used as a last resort when other
algorithms are not effective.2B Adownced Date Structures and Algorithme
VE ee
Questions
L
Write a short not A
MARKS)] PO ©” Rabin Karp Algorithm (MU DEC 1819
2. Explain. Rabi; Hl
EAplain Rabin Karp Algorithm in detail IMU DEC 1909
3
4,
wks MAY'19(10 MARKS)]}
DECHEnO MARKS IMU SIAN OE MeO ME
hot is fs longest common subsequence problem? Find
Hecate lin sno Dae oa SARS
What is the longest common subsequence problem? Find LCS
for the following string: [MU MAY'19(5 MARKS)]
Explain Genetic Algorithm{MU DEC'19(10 MARKS)]
Write short note on Genetic Algorithm [MU MAY195
MARKS)]
Join our book's Discord space
Join the book's Discord Workspace for Latest updates, Offers, Tech happenings
around the world, New Release and Sessions with the Authors:
httpsi//discord.bpbonline.com
bots
reaps problems 17, 125
examples 126, 127
2.37Tee operation 39
cloment, inserting 41-43
element, searching 40, 41
3order B-tree 39
A
advanced data structures 29
All Pairs Shortest Path (APSP)
igorithm 117, 119
example 120-125
analysis of algorithms 1-3
need for 3
asymptotic notations 3
Big Onotation 4
examples 5-8
features 5
Index
Omega © notation 5
Theta © notation 3, 4
AVL Tree 29
LL rotation 29-31
LR rotation 29-32
RL rotation 30,32
RR rotation 29, 30
B
Big O notation 4
Binary Search 67, 69, 70
algorithm 70
analysis of algorithm 72
examples 71,72
Binary Search Tree (BST) 30, 141
Betree 58
constructing 59
example 58NR RT SN _
174 Bi _ Advanced Data Structures and. Algorit —_—
ch
approx
. fe and conquer APP
splitting 59-64 ReHeap-up 53 aide;
c heap sort ee examp! im
Coin Changing Problem 118, algorithm 54 ain Rete Optimal Storage, on single tape
132 example 56-58 Seat ‘examples 102, 103
example 133,
Hutfinan algorith 33 oe 7 primal Storage, on tapes 102
compressed tres 51
examples 33-39
Container Loading Problem 112 52,53 P
lem inheap 52 7 al solutions 169
algorithm 112 y | min hea spanning Te partial solutions 1
: Job Sequencing Problem 93 Minimum Pivot 79
example 112, 113 Igorithm 93-95 95,96
crossover operator 170 st ars algorithm 96 Prim’s algorithm
D K ae gost 99 example 99-101
Prim's algo!
Prope Knapsack problem 89, 90 sinimam Spanning Tee (ST) a
a structures 29 0/1 (Binary) Knapsack 91 89 7 68,79
divide and conquer technique eremnpis 91-55 | jon operator 170 Quick sort 68,
a fractional Krapsack 91 } 2 swaten ors algorithm 79
dynamic algorithms 117- Se eee dl alysis of algorithm 81
oe perez Knuth-Morris-Pratt (KMP) ! N -Matching algo- analysis of algo!
E algorithm 149, 159, 160 Naive String Ns examples 81-83
ener examples 160-166 a 2
algorithm 15% R
inserting, in 2-3 tree 41-43 pesuziel eolgpditan 26 ceemples 150-156 Rabin-Karp algorithm 157
searching, in 2-3 tree 40, 41 example 96-99 i NP-Complete problems 145 ‘examples 157-159
F L : NP-Hard problems 145 ' recurrence 17
iow Show echeahinigciidl 439; Longest Common Subsequence | _.NP(mondeterministic polynomi recurrence methods 18
0 = (Ls) 166 al tine) 144 ‘Master theorem 20
algorithm 140 same 167, 168 recursive tree method 22
LPS table 160 i ° Bu
G ‘Omega Q notation 5 Substitution method 18
Genetic Algorithms (GA) 168 M operators, Genetic Algorithms recarsion tree: 22
fitness score 169 Master theorem 20, 21 (Ga) recursion tree method
eaaceearel example 21,220 crossover operator 170 example 22,23
search space 169 Matrix Chain multiplication red-black tree 43, 44
seedy algorithm 89 selection operator 17 examples 44-50
ae, example 135-139 optimal Binary Search Tree z eee eat
H features 134 (BST) 118, 141 Seat
heap 52 max heap 52 algorithin(-141 ReHeap-up 53
maximum heap 52 ‘Max Min problem 67,73 examples 142, 143 s
n im heap 52 alee a Optimal merge pattern. 105 selection operator 170
ReHeap-down 54 analysis of algorithm 75 algorithm 106 set cover problem’ 110176 B& Advanced Data Structure
algorithm 110
example 110, 111
standard tries 51
Strassen algorithm 68
Strassen’s matrix multiplica-
tion 83
analysis of algorithm 86
examples 84
String matching operation 149
substitution method 18
example 18-20
suffix tries
features 51
|
|
|
1s and Algorithms
T
‘Theta © notation 3,4
time complexity 9, 10
calculating 11
examples 11-14
general rules 14-17
rate of growth of algorithm,
calculating 1
Travelling Salesman Problem
(TSP) 117, 128
example 128-130
tries 51
compressed tries 51
standard tries 51
suffix tries 51