[go: up one dir, main page]

0% found this document useful (0 votes)
7 views15 pages

Adv Data Structure Chapter - 6

Chapter 6 discusses string matching algorithms, including the Naive String-Matching Algorithm, Rabin-Karp Algorithm, and Knuth-Morris-Pratt (KMP) Algorithm. The chapter aims to teach readers how to find occurrences of a pattern within a given text using these algorithms, providing examples and explanations for each method. Additionally, it covers the concept of the Longest Common Subsequence and introduces genetic algorithms in the context of string matching.

Uploaded by

lakshmimaga564
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
7 views15 pages

Adv Data Structure Chapter - 6

Chapter 6 discusses string matching algorithms, including the Naive String-Matching Algorithm, Rabin-Karp Algorithm, and Knuth-Morris-Pratt (KMP) Algorithm. The chapter aims to teach readers how to find occurrences of a pattern within a given text using these algorithms, providing examples and explanations for each method. Additionally, it covers the concept of the Longest Common Subsequence and introduces genetic algorithms in the context of string matching.

Uploaded by

lakshmimaga564
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 15
CHAPTER 6 String Matching Introduction sting matching operation is one of the basic parts in many text processing applications. The main aim of string-matching algorithm o find the pattern P from the given text T. Mos pattern P is much smaller than the length 7, String matching problem is defined as the length of the the text T, that is, P<< jiven some text or some string T[1....n] of size n, find alll occurrences of pattern P{1....m] of size, . There are a few algorithms such as naive string-matching algorithms, Knuth-Morris-Pratt(KMP) Algorithm, Rabin-Karp algorithm, and 80 on, which will be discussed with problems clearly. Pa NE St A OTe ee 150 Advanced Data Structures and Algorithms SEL RE Avance Date Structures and Algrittnes on 10 Fist Structure In this chapter, we will discuss the following topics: * The Naive String-Matching Algorithm + Rabin Karp Algorithm * The Knuth-Morris-Pratt Algorithm + Longest Common Subsequence String Matching 151 Figure 6.1: T(1} compared with P{1] + Genetic Algorithms Ss Objectives siep 2 TEI-PPL PI refer to Figure 6 By the end of this chapter, the reader will learn about various string. matching algorithms with suitable examples. Readers will also learn 4 5 6 7 8 9 WW MW 12 about genetic algorithms. EE PeeEP PE i The Naive String-Matching ef Algorithm peel) 3 4 . In this algorithm, it first compares the first character of patter r 2 with searchable text. If a match is not found, pointers of both the Figure 6.2: TI2] compared with PC1) strings are moved forward. If a match is found, the pointer of text is incremented and the pointer of pattern is reset. This process is step 3: TI3}= PIS], t, ++, Pyt+ repeated till the end of the text. Naive approach never requires any : Preprocessing. It starts directly comparing the pattern P and text T, Refer to Figure 6.3; character by character. After each comparison, the pointer shifts the 12 30 4) SERGE TE 89 attern string one position to the right. a ® aye Je Ja Ja fase Jaye Example 1 { 12 [ee Perform pattern matching using the naive algorithm. w]e Te Ta TF String: abedadcabedf 12 3.4 S Pattern: abed f eee Figure 6.3: TI3] compared with PIS) Solutio t,and p,are indices of text and pattern respectively. Step 3: T[1] =PI1], t, ++, p+ aa Tae Se fa Le? te 152 Addoonced Data Structures and Algorithms Step 4: Tl4}= Pld}, ¢ ++, p++ String Matching 153 Refer to Figure 6 Aig 32 £00 By Be 9 10 = AT 2: tees a |e |@ fe b Je Ja yf aye Je le [fF iene) 4 5 Figure 6.7: TI3] compared with P(1] Figure 64: TI4) compared with PI] 1 Tid] PED, Step 5: 715] PIS), 4,44, pat step 8TH) * Refer to Figure 6.5: pofer to Figure 64 12 3 4 5 6 7 8 9 wo B mo Te | d fe fa |b ad [tf Te a [a PE b fe ]4 fi “J 1 2 3 4 5 Figure 65: T15] compared with P{5] igspg GEpTIE) exvapared with PEL Step 6: T[2] « P[1], ++ Step 9: T[5] = PI1], t, + Pitt Refer to Figure 6.6: Refer to Figure 6.9: 12 3 4 5 6 7 8 9 wo wn 2 1 2 3 4eess6 70 8 9 lod 2 [ b Je |d fa |d fe a ’ Je al 2 3 Figure 6.6: T12] compared wi PUY Step 10: T16] « PI2},, 4,44 Refer to Figure 6.10: Pees Bu #25 gt 2 Ga Ge ee ee i © |@ Figure 6:13: TI8] compared with P(1] ars Figure 6.10: TI6] compared oe qigl=Pl2) b+ Pt Refer to Figure 6.11: sero igure 61H 4 5 6 7 8 9 lo Wo 12 Ste tep 11: Tl6] = Pl], ++ se zeus, *4) 3 48 9 on RB Figure 6.14: TIS] compared with PI2} Step 12: [7] P[t], 4 ++ sep THO}ePAL. 44 2+ Refer to Figure 6.12: Refer to Figure 6.15: Ee Tenl#: Figure 6.12: TI7] compared with P[1] Figure 6,15: T{10] compared with PLA] SESE ae wae, eee 156 &_ Adeanced Data $1 . red Data Structures and Al Igorithas Step 16: TIM}=P[4 Refer to Figure 6.1 Figure 6.16: 71a] compared with Pls} Step 17: T112}=PI5],, ++, p++ Figure 6.17: T112} compared ‘Thus, the string is matched. Algorithm for Naive bayes string matching The algorithm for Naive bays string matching is as foll NAIVE_STRING_MATCH(T,P) for i=@ to isn-m do if P[1.m] == T[i+1L+m]then print “Match found” Zerraaneee i ad gsc of (O(n) and the worst case is Ofmn*(n-m+1)). in Karp Algorithm matching algorithms is the R ‘of matching every element of t the hash value to match the string matches each character of that string, is one of the most -Karp algorithm. ngs resents values of the pattern A[1....m] of length m, rep? 00 he

] eH Pawn [A]a}s}e|alptalals le ursvalue] o [i [ofofifolif2z[sfo ae Table 6.2: LPS table String Matching @ 163 i 2 3 4 5 Ei a FEL at 3 [4 | ps .value : a a Apt a Te Pe Tats Pa Pe pp Table 6.8: Compare ali] with alj+1] inde i step 7:ali] + alj+1] : move j to index pointed by the position j is now — ° z 2 3 a 5 present that is, move j to index 2 pattem 69: LPS vs ts wofer to Table 6.9: LPS value 5 A e D ms ~~ L 2 0 ‘i i 1 2 3 4 5 6 7 8 9 10 eal A B A B c A B A B D i index 0 1 2 3 4 tern A B A B f 9 10. ~ icp LPS value 0 0 1 2 0 Table 6.9: Compare afi] with alj+1] . Step 8: afi] + alj+1] ; move j to index pointed by the position jis now 5 present that is move j to index 0. ~ Refer to. le 6.10: 1 3 4 6 7 8 7 10, A A B A B A B D Table 6.10: Compare halt) s already at a[0] therefore increment = = B Dd leo ema Moe |Si0 Cl ates Tals pp 1 2 3 4 pattern A B A B LPS value 0 oO 1 Table 6.11: Compare a Step 10: ali] == ajj Refer to Table 6.12: i 1 A 0 Table 6.13: Compa B 0 real index pattern LPS value c [ays Tats po ey 10 Se ® Pattern A B A B D LPS value 0 0 1 2 0 Table 6.12: Compare. Step 11: afi] == A] site j joe Refer to Table 6.13: index Pattern, LPS value ee Re TOO BE Adounced Data Structures and Aloritis Step 14: afi} == afjst) vs jo Refer to Table 6.16: LurSvalee | -9 5 Table 6.16: Compare afi] with alj+1] PELE Hence, in this way we can match the string using KMP algorithm, Longest Common Subsequence (LCs) ICS problems consist of two sequences of which, we need to find out the common longest subsequent present among them. Though dynamic programming method in LCS follows a bottom: "P approach, the matrix table gets filled from top-bottom. It is an approach followed to solve the problem. In the matrix table, rows are equal to the number of alphabets present in String 1 and columns are equal to the number of alphabets in String 2. To start with the algorithm, let us initialize the first row and first column of the matrix (that is, index 0 of the matrix) values as zero. Next, compare between the elements of string 1 and the elements of string 2.1f there isa match, then add value 1 with the value of the previous diagonal clement. If no match exists, then consider the maximum value out of either previous row element or column element. The process continues till all the columns and rows are filled. To find the longest common subsequence, start tracing back from the longest value towards diagonal clement and thus the results are obtained. The time using dynamic prog ram E phabets pre jot OFA denotes number of habs p otten =r ites as number of alphabets sO denotes 4 ersta 138 gan example for better understan consider 4 ets le 7 5: fae of the strings given as follows a sind th airing 12 5t0N€ syring 2:Jonsest olution: refer to Tbe 617+ 7 a a [| | iro fo | 0 | 0 | 0 ofo}o}fofo}o}irts polio aoe cons [trom [exon Weaerera o fo Papa fs | tj a} 2 mee to fa he Leet 2 | 2 | 2 a | ata recion ae poe o Ln ‘Therefore, LCS = one Example 8 String 1: minom String nom Table 6.17: LCS table id LCS of the strings given as follows: ed Data Structures and Algorithms on Refer to Table 6.18. gearch Space i reserved in the search space. ation of the individual 1s P p o>1][2][s]4 he popula Pe represented by a vector of components with 2 och pesen Wt etements il resemble genes in certain ways. As a o 0 0 0 0 0 frite OMEN ont or various genes make up each chromosome, Refer esl z o 1 1 1 1 1 ar re 6.18! 3 ofa [a [2st 2 [>> om 1 1 2 3 5 a ce oe Table 6.18: CS table Therefore, LCS = mnom Figure 6.18: Genetic Algo Genetic Algorithms Most evolutionary algorithms are adaptive heuristic search Fitness score Fhe eather chperson receives fitness score that indicates how well-equipped the concepts of natural selection and genetics. They are frequently ach pet 7 A eet al ay the changing environment. TheGAs maint mploye« er 7 ey are, tocompete in’ ging i felted omnes pee tele a ar aon, along with the fitness score of each individual. The i ir Redual having higher fitness score than others wil be given more Geneti algorithms are nothing but the natural selection of the species which can survive and adapt to the changing environment and could reproduce and go to the next generation. In other words, ae nce to reproduce, as compared to others. The candidate with the highest fitness score is chosen because they can unite with their mate amet combine their chromosomes to make superior children. Since the it can be termed as “survival of the fittest” among individuals of the population is stati, space must be made for newcomers. As a result, consecutive generations, of the problem to be solved. ome people pass away and are replaced by newcomers, eventually ‘The following analogy servesas the foundation for geneticalgorithms: giving rise to a new generation, once the previous population had all 1. Population members fight for resources and mates. its chances to mate. Better genes were passed down from members of the previous generation to each subsequent generation. As a result, younger generations always have more effective “partial solutions” than earlier ones, The population has converged once the progeny produced show no discernible difference from progeny from earlier populations. It is said that the algorithm has “converged” on a set of possible solutions to the issue. 2. The successful (fittest) individuals then mate to have more children, than other individuals. 3. Genes from the “fittest” parent spread throughout the generation, which means that occasionally parents produce offspring who are superior to both parents. 4. Asa result, each next generation becomes more environment- friendly. 170 18 Advanced Data st —Aewiced Data Structures and Algoritios Operators of Let us now genetic algorithms See the diffe come ni it ent operators of neti lgoridms: erator: Th Selection Op 1 goa isto favor the individu, Seana hen onthe fitness scale, allowing them ote ave wit genes to the following generation’ By cf OFerstOFThisis the union the chosen ope he crestover sites are then determined at random, They switched, producing an entirely nee neon” Figure 6.19 Features the Crossover operation: . Ossi emer |e Figure 6.19: Crossover operation Mutation Operator: The Ty g0al is to maintain genetic div by randomly introducing genes into “offprin Pope early convergence can be avoided, az shown fe eee Ext cone wn in the following Before Mutation (Ter eel tr lays ~— ‘After Mutation PCE Figure 6.20: Mutation ie... ——_ ; at the Rabin-Karp ve preceding algorithms, wwe can say that th an the PCS KMP algorithm are the effective ones as compared to he aM ne patterns are smal and the alphabets are small The sem ste solation with the fowest execution fe oil pave a Key Facts rhe string-matching algorithms are used for finding the + The Bronce of a pattern string within a larger tox sting ore algorithms can be used for a varity of applications, Thetrding text processing, data compression, and pattern recognition. The naive string-matching algorithm is a simple algorithm, + The fas fime complexity of O(n * m). tis suitable for small tat edium-sized strings, but it can be slow for larger strings The Rabin-Karp algorithm is an improvement over the naive Tuing-matching algorithm, The Rabin-Karp algorithm is @ sees tool for text processing, data compression, and pattern recognition Ithas a time complexity of O(n +m). The Knuth-Morris-Pratt (KMP) algorithm isan improvement overthe naive string-matching algorithm and the Rabin-Karp aigorthin. Te has a time complexity of O(n +m) and is more ‘efficient for large strings. ‘The Longest Common Subsequence (LCS) is a problem in computer science and operations research,that involves finding the longest subsequence that is common to two or more strings. Itis an example of a problem that can be solved using optimization techniques, which are used to find the best solution to a problem given a set of constraints. Genetic algorithms are a type of optimization algorithm that is inspired by the process of natural evolution. Genetic algorithms are used in a variety of applications, including optimization, machine learning, and artificial intelligence. They are a useful tool for finding good solutions to difficult problems and are often used as a last resort when other algorithms are not effective. 2B Adownced Date Structures and Algorithme VE ee Questions L Write a short not A MARKS)] PO ©” Rabin Karp Algorithm (MU DEC 1819 2. Explain. Rabi; Hl EAplain Rabin Karp Algorithm in detail IMU DEC 1909 3 4, wks MAY'19(10 MARKS)]} DECHEnO MARKS IMU SIAN OE MeO ME hot is fs longest common subsequence problem? Find Hecate lin sno Dae oa SARS What is the longest common subsequence problem? Find LCS for the following string: [MU MAY'19(5 MARKS)] Explain Genetic Algorithm{MU DEC'19(10 MARKS)] Write short note on Genetic Algorithm [MU MAY195 MARKS)] Join our book's Discord space Join the book's Discord Workspace for Latest updates, Offers, Tech happenings around the world, New Release and Sessions with the Authors: httpsi//discord.bpbonline.com bots reaps problems 17, 125 examples 126, 127 2.37Tee operation 39 cloment, inserting 41-43 element, searching 40, 41 3order B-tree 39 A advanced data structures 29 All Pairs Shortest Path (APSP) igorithm 117, 119 example 120-125 analysis of algorithms 1-3 need for 3 asymptotic notations 3 Big Onotation 4 examples 5-8 features 5 Index Omega © notation 5 Theta © notation 3, 4 AVL Tree 29 LL rotation 29-31 LR rotation 29-32 RL rotation 30,32 RR rotation 29, 30 B Big O notation 4 Binary Search 67, 69, 70 algorithm 70 analysis of algorithm 72 examples 71,72 Binary Search Tree (BST) 30, 141 Betree 58 constructing 59 example 58 NR RT SN _ 174 Bi _ Advanced Data Structures and. Algorit —_— ch approx . fe and conquer APP splitting 59-64 ReHeap-up 53 aide; c heap sort ee examp! im Coin Changing Problem 118, algorithm 54 ain Rete Optimal Storage, on single tape 132 example 56-58 Seat ‘examples 102, 103 example 133, Hutfinan algorith 33 oe 7 primal Storage, on tapes 102 compressed tres 51 examples 33-39 Container Loading Problem 112 52,53 P lem inheap 52 7 al solutions 169 algorithm 112 y | min hea spanning Te partial solutions 1 : Job Sequencing Problem 93 Minimum Pivot 79 example 112, 113 Igorithm 93-95 95,96 crossover operator 170 st ars algorithm 96 Prim’s algorithm D K ae gost 99 example 99-101 Prim's algo! Prope Knapsack problem 89, 90 sinimam Spanning Tee (ST) a a structures 29 0/1 (Binary) Knapsack 91 89 7 68,79 divide and conquer technique eremnpis 91-55 | jon operator 170 Quick sort 68, a fractional Krapsack 91 } 2 swaten ors algorithm 79 dynamic algorithms 117- Se eee dl alysis of algorithm 81 oe perez Knuth-Morris-Pratt (KMP) ! N -Matching algo- analysis of algo! E algorithm 149, 159, 160 Naive String Ns examples 81-83 ener examples 160-166 a 2 algorithm 15% R inserting, in 2-3 tree 41-43 pesuziel eolgpditan 26 ceemples 150-156 Rabin-Karp algorithm 157 searching, in 2-3 tree 40, 41 example 96-99 i NP-Complete problems 145 ‘examples 157-159 F L : NP-Hard problems 145 ' recurrence 17 iow Show echeahinigciidl 439; Longest Common Subsequence | _.NP(mondeterministic polynomi recurrence methods 18 0 = (Ls) 166 al tine) 144 ‘Master theorem 20 algorithm 140 same 167, 168 recursive tree method 22 LPS table 160 i ° Bu G ‘Omega Q notation 5 Substitution method 18 Genetic Algorithms (GA) 168 M operators, Genetic Algorithms recarsion tree: 22 fitness score 169 Master theorem 20, 21 (Ga) recursion tree method eaaceearel example 21,220 crossover operator 170 example 22,23 search space 169 Matrix Chain multiplication red-black tree 43, 44 seedy algorithm 89 selection operator 17 examples 44-50 ae, example 135-139 optimal Binary Search Tree z eee eat H features 134 (BST) 118, 141 Seat heap 52 max heap 52 algorithin(-141 ReHeap-up 53 maximum heap 52 ‘Max Min problem 67,73 examples 142, 143 s n im heap 52 alee a Optimal merge pattern. 105 selection operator 170 ReHeap-down 54 analysis of algorithm 75 algorithm 106 set cover problem’ 110 176 B& Advanced Data Structure algorithm 110 example 110, 111 standard tries 51 Strassen algorithm 68 Strassen’s matrix multiplica- tion 83 analysis of algorithm 86 examples 84 String matching operation 149 substitution method 18 example 18-20 suffix tries features 51 | | | 1s and Algorithms T ‘Theta © notation 3,4 time complexity 9, 10 calculating 11 examples 11-14 general rules 14-17 rate of growth of algorithm, calculating 1 Travelling Salesman Problem (TSP) 117, 128 example 128-130 tries 51 compressed tries 51 standard tries 51 suffix tries 51

You might also like