0% found this document useful (0 votes)

28 views33 pages

Pattern Matching

The document discusses various string searching algorithms, including Brute Force, Rabin-Karp, and Knuth-Morris-Pratt (KMP), focusing on their methodologies and complexities. It explains how each algorithm operates, with Brute Force comparing characters directly, Rabin-Karp using hash values for efficiency, and KMP utilizing a failure function to optimize searches. Additionally, it introduces regular expressions as a notation for describing sets of strings.

Uploaded by

telacet362

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views33 pages

Pattern Matching

Uploaded by

telacet362

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Strings and Pattern Matching

• Brute Force,Rabin-Karp, Knuth-Morris-Pratt

• Regular Expressions

1
String Searching
• The object of string searching is to find the location of a
specific text pattern within a larger body of text (e.g., a
sentence, a paragraph, a book, etc.).
• As with most algorithms, the main considerations for string
searching are speed and efficiency.
• There are a number of string searching algorithms in
existence today, but the three we shall review are Brute
Force,Rabin-Karp, and Knuth-Morris-Pratt.

2
Brute Force
• The Brute Force algorithm compares the pattern to the text, one
character at a time, until unmatching characters are found

Compared characters are italicized.

Correct matches are in boldface type.
• The algorithm can be designed to stop on either the first
occurrence of the pattern, or upon reaching the end of the text. 3
Brute Force Pseudo-Code
• Here’s the pseudo-code
do if (text letter == pattern letter)
compare next letter of pattern to next
letter of text
else move pattern down text by one letter
while (entire pattern found or end of text)

4
Brute Force-Complexity
• Given a pattern M characters in length, and a text N characters in
length...
• Worst case: compares pattern to each substring of text of length M.
For example, M=5.
• This kind of case can occur for image data.

Total number of comparisons: M (N-M+1) 5

Worst case time complexity: O(MN)
Brute Force-Complexity(cont.)
• Given a pattern M characters in length, and a text N characters in
length...
• Best case if pattern found: Finds pattern in first M positions of text.
For example, M=5.

Total number of comparisons: M

Best case time complexity: O(M) 6
Brute Force-Complexity(cont.)
• Given a pattern M characters in length, and a text N characters in
length...
• Best case if pattern not found: Always mismatch on first character.
For example, M=5.

Total number of comparisons: N 7

Best case time complexity: O(N)
Rabin-Karp
• The Rabin-Karp string searching algorithm calculates a hash value
for the pattern, and for each M-character subsequence of text to be
compared.
• If the hash values are unequal, the algorithm will calculate the hash
value for next M-character sequence.
• If the hash values are equal, the algorithm will do a Brute Force
comparison between the pattern and the M-character sequence.
• In this way, there is only one comparison per text subsequence, and
Brute Force is only needed when hash values match.
• Perhaps an example will clarify some things...
8
Rabin-Karp Example
• Hash value of “AAAAA” is 37
• Hash value of “AAAAH” is 100

9
Rabin-Karp Algorithm

pattern is M characters long

hash_p=hash value of pattern
hash_t=hash value of first M letters in body of text
do
if (hash_p == hash_t)
brute force comparison of pattern
and selected section of text
hash_t= hash value of next section of text, one character over
while (end of text or
brute force comparison == true)
10
Rabin-Karp

• Common Rabin-Karp questions:

“What is the hash function used to calculate values for
character sequences?”
“Isn’t it time consuming to hash very one of the M-character
sequences in the text body?”
“Is this going to be on the final?”

• To answer some of these questions, we’ll have to get mathematical.

11
Rabin-Karp Math
• Consider an M-character sequence as an M-digit number in base b,
where b is the number of letters in the alphabet. The text subsequence
t[i .. i+M-1] is mapped to the number

• Furthermore, given x(i) we can compute x(i+1) for the next

subsequence t[i+1 .. i+M] in constant time, as follows:

• In this way, we never explicitly compute a new value. We

simply adjust the existing value as we move over one 12
character.
Rabin-Karp Math Example

• Let’s say that our alphabet consists of 10 letters.

• our alphabet = a, b, c, d, e, f, g, h, i, j
• Let’s say that “a” corresponds to 1, “b” corresponds to 2 and so
on.
The hash value for string “cah” would be ...

3100 + 110 + 8*1 = 318

13
•
Rabin-Karp Mods
If M is large, then the resulting value (~bM) will be enormous. For this
reason, we hash the value by taking it mod a prime number q.
• The mod function (% in Java) is particularly useful in this case due to
several of its inherent properties:
[(x mod q) + (y mod q)] mod q = (x+y) mod q
(x mod q) mod q = x mod q
• For these reasons:
h(i)=((t[i] bM-1 mod q) +(t[i+1] bM-2 mod q) + ...
+(t[i+M-1] mod q))mod q
h(i+1) =( h(i)  b mod q
Shift left one digit
-t[i]  bM mod q
Subtract leftmost digit
+t[i+M] mod q )
Add new rightmost digit 14
mod q
Rabin-Karp Mods

15
Rabin-Karp Mods

16
Rabin-Karp Complexity
• If a sufficiently large prime number is used for the hash function,
the hashed values of two different patterns will usually be distinct.
• If this is the case, searching takes O(N) time, where N is the
number of characters in the larger body of text.
• It is always possible to construct a scenario with a worst case
complexity of O(MN). This, however, is likely to happen only if
the prime number used for hashing is small.

17
The Knuth-Morris-Pratt Algorithm
• The Knuth-Morris-Pratt (KMP) string searching algorithm differs from the
brute-force algorithm by keeping track of information gained from previous
comparisons.
• A failure function (f) is computed that indicates how much of the last
comparison can be reused if it fails.
• Specifically, f is defined to be the longest prefix of the pattern P[0,..,j] that is
also a suffix of P[1,..,j]
-Note: not a suffix of P[0,..,j]
• Example:-value of the
• KMP failure function:

• This shows how much of the beginning of the string matches up to the
portion immediately preceding a failed comparison.
-if the comparison fails at (4), we know the a,b in positions 2,3 is identical
to positions 0,1 18
The Knuth-Morris-Pratt
Algorithm

19
The Knuth-Morris-Pratt
Algorithm

20
The Knuth-Morris-Pratt
Algorithm

21
The Knuth-Morris-Pratt
Algorithm

22
The Knuth-Morris-Pratt
Algorithm

23
The Knuth-Morris-Pratt
Algorithm

24
The Knuth-Morris-Pratt
Algorithm

25
The Knuth-Morris-Pratt
Algorithm

26
The KMP Algorithm (contd.)
• the KMP string matching algorithm: Pseudo-Code

Algorithm KMPMatch(T,P)
Input: Strings T (text) with n characters and P
(pattern) with m characters.
Output: Starting index of the first substring of T
matching P, or an indication that P is not a
substring of T.

27
Algorithm
f  KMPFailureFunction(P) {build failure function}
i0
j0
while i < n do
if P[j] = T[i] then
if j = m - 1 then
return i - m - 1 {a match}
ii+1
jj+1
else if j > 0 then {no match, but we have advanced}
j  f(j-1) {j indexes just after matching prefix in P}
else
ii+1
return “There is no substring of T matching P”
28
The KMP Algorithm (contd.)
• The KMP failure function: Pseudo-Code

Algorithm KMPMatch(T,P)
Input: String P (pattern) with m characters
Output: The failure function f for P, which maps j to
the length of the longest prefix of P that is a suffix
of P[1,..,j]

29
Algorithm
f  KMPFailureFunction(P) {build failure function}
i0
j0
while i  m-1 do
if P[j] = T[i] then
if j = m - 1 then
{ we have matched j+1 characters}
f(i)  j + 1
ii+1
jj+1
else if j > 0 then
j  f(j-1) {j indexes just after matching prefix in P}
else {there is no match}
f(i)  0
ii+1 30
The KMP Algorithm (contd.)
• A graphical representation of the KMP string searching algorithm

31
The KMP Algorithm (contd.)
• Time Complexity Analysis
• define k = i - j
• In every iteration through the while loop, one of three things happens.
1) if T[i] = P[j], then i increases by 1, as does j k remains the same.
2) if T[i] != P[j] and j > 0, then i does not change and k increases by at least 1,
since k changes from i - j to i - f(j-1)
3) if T[i] != P[j] and j = 0, then i increases by 1 and k increases by 1 since j remains
the same.
• Thus, each time through the loop, either i or k increases by at least 1, so the
greatest possible number of loops is 2n
• This of course assumes that f has already been computed.
• However, f is computed in much the same manner as KMPMatch so the time
complexity argument is analogous. KMPFailureFunction is O(m)
• Total Time Complexity: O(n + m) 32
Regular Expressions
• notation for describing a set of strings, possibly of infinite
size
•  denotes the empty string
• ab + c denotes the set {ab, c}
• a* denotes the set {, a, aa, aaa, ...}
• Examples
(a+b)* all the strings from the alphabet {a,b}
b*(ab*a)*b* strings with an even number of a’s
(a+b)*sun(a+b)* strings containing the pattern “sun”
(a+b)(a+b)(a+b)a 4-letter strings ending in a
33

Strings
No ratings yet
Strings
23 pages
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt
No ratings yet
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt
49 pages
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt - Regular Expressions
No ratings yet
Trings and Attern Atching: - Brute Force, Rabin-Karp, Knuth-Morris-Pratt - Regular Expressions
21 pages
Lecture 34, 35 36 - String Matching Algorithms
No ratings yet
Lecture 34, 35 36 - String Matching Algorithms
42 pages
Unit II
No ratings yet
Unit II
94 pages
String Matching 2019
No ratings yet
String Matching 2019
50 pages
Patternmatching
No ratings yet
Patternmatching
29 pages
String Matching Algorithms
No ratings yet
String Matching Algorithms
25 pages
Brute Force & Rabin-Karp Algorithms
No ratings yet
Brute Force & Rabin-Karp Algorithms
13 pages
Unit 3-Pattern Matching
No ratings yet
Unit 3-Pattern Matching
43 pages
54.string Inotes
No ratings yet
54.string Inotes
20 pages
DAA DA Output
No ratings yet
DAA DA Output
9 pages
D & A of Algorithms - 14
No ratings yet
D & A of Algorithms - 14
15 pages
Unit 2 - Letter ManipilationPattern Searching
No ratings yet
Unit 2 - Letter ManipilationPattern Searching
19 pages
Abstract
No ratings yet
Abstract
12 pages
String Matching Kmprabin Karp and Naive
No ratings yet
String Matching Kmprabin Karp and Naive
41 pages
DAA Unit 5 Part 1
No ratings yet
DAA Unit 5 Part 1
27 pages
String Matching Chapter 12 Goodrich Nep
No ratings yet
String Matching Chapter 12 Goodrich Nep
43 pages
Lecture#8 - String Matching Algorithm
No ratings yet
Lecture#8 - String Matching Algorithm
38 pages
String Matching Algorithms Guide
No ratings yet
String Matching Algorithms Guide
52 pages
Unit 3-Pattern Matching
No ratings yet
Unit 3-Pattern Matching
42 pages
Algoritmen & Datastructuren 2012 - 2013 Substring Search (Slides by Sedgewick)
No ratings yet
Algoritmen & Datastructuren 2012 - 2013 Substring Search (Slides by Sedgewick)
32 pages
String Matching
No ratings yet
String Matching
63 pages
Strings and Pattern Matching
No ratings yet
Strings and Pattern Matching
17 pages
DAA Unit 5
No ratings yet
DAA Unit 5
22 pages
String Matching
No ratings yet
String Matching
4 pages
CH 8
No ratings yet
CH 8
26 pages
UNIT-V String Matching
No ratings yet
UNIT-V String Matching
24 pages
Ada Notes Unit 4
No ratings yet
Ada Notes Unit 4
28 pages
Daa Da
No ratings yet
Daa Da
9 pages
String Matching
No ratings yet
String Matching
35 pages
String Matching
No ratings yet
String Matching
30 pages
11 Data Structures and Algorithms - Narasimha Karumanchi
100% (1)
11 Data Structures and Algorithms - Narasimha Karumanchi
12 pages
M269 - Lec8 Fall 1819
No ratings yet
M269 - Lec8 Fall 1819
24 pages
Lecture 04 Inaryseachtree
No ratings yet
Lecture 04 Inaryseachtree
20 pages
M3-String Matching
No ratings yet
M3-String Matching
74 pages
String Matching
100% (1)
String Matching
27 pages
Adsa
No ratings yet
Adsa
9 pages
Pattern Matching 2
No ratings yet
Pattern Matching 2
46 pages
String Matching
No ratings yet
String Matching
34 pages
4th Sem DAA Module 4
No ratings yet
4th Sem DAA Module 4
10 pages
Rabin Karp
No ratings yet
Rabin Karp
11 pages
04 03-PatternMatchingAndTries
No ratings yet
04 03-PatternMatchingAndTries
28 pages
String Matching Algorithms Guide
No ratings yet
String Matching Algorithms Guide
57 pages
String Matching Algorithms Analysis
No ratings yet
String Matching Algorithms Analysis
5 pages
Pattern Matching Algo
No ratings yet
Pattern Matching Algo
21 pages
Lecture 56string Matching
No ratings yet
Lecture 56string Matching
43 pages
String Matching
No ratings yet
String Matching
89 pages
Rabin Karp Algorithm of Pattern Matching (Goutam Padhy)
No ratings yet
Rabin Karp Algorithm of Pattern Matching (Goutam Padhy)
15 pages
Lec 7
No ratings yet
Lec 7
24 pages
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
No ratings yet
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
18 pages
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
No ratings yet
Cse2012 Design and Analysis of Algorithms Lab Digital Assignment 2
18 pages
String Algorithms for CS Students
No ratings yet
String Algorithms for CS Students
48 pages
Algo Lab Project
No ratings yet
Algo Lab Project
9 pages
String Matching Algorithms Guide
No ratings yet
String Matching Algorithms Guide
46 pages
Rabin Karp
No ratings yet
Rabin Karp
7 pages
Topcoder Article
No ratings yet
Topcoder Article
8 pages
Week1 Lecture 1
No ratings yet
Week1 Lecture 1
24 pages
Week 3 Lecture 3.1
No ratings yet
Week 3 Lecture 3.1
9 pages
Week 12 Lecture 12
No ratings yet
Week 12 Lecture 12
22 pages
INS - Lecture12 - Digital Signature
No ratings yet
INS - Lecture12 - Digital Signature
25 pages
Shortest Path Algorithm
No ratings yet
Shortest Path Algorithm
149 pages
Week2 Lecture 2.1
No ratings yet
Week2 Lecture 2.1
25 pages
INS - Lecture13 - Authentication Application - Upd
No ratings yet
INS - Lecture13 - Authentication Application - Upd
27 pages
INS Lecture14 TLS SSL HTTPS SET
No ratings yet
INS Lecture14 TLS SSL HTTPS SET
39 pages
INS - Lecture11 - Key Management and Distribution
No ratings yet
INS - Lecture11 - Key Management and Distribution
16 pages
INS-Lecture 15-16 - IPSec - Firewall - VPN - IDS
No ratings yet
INS-Lecture 15-16 - IPSec - Firewall - VPN - IDS
42 pages
ITS662 Chapter 5 - Fuzzy Expert System
No ratings yet
ITS662 Chapter 5 - Fuzzy Expert System
29 pages
(Ebook) Introductory Topology: Exercises and Solutions by Mohammed Hichem Mortad ISBN 9789813146938, 9789813148024, 9813146931, 9813148020, B01M4OVY9F Download Full Chapters
100% (2)
(Ebook) Introductory Topology: Exercises and Solutions by Mohammed Hichem Mortad ISBN 9789813146938, 9789813148024, 9813146931, 9813148020, B01M4OVY9F Download Full Chapters
135 pages
Grade 6 ICT Flowcharts Worksheet
No ratings yet
Grade 6 ICT Flowcharts Worksheet
4 pages
Advanced Mathematics Syllabus
No ratings yet
Advanced Mathematics Syllabus
18 pages
Extensions To The Basic Turing Machine
No ratings yet
Extensions To The Basic Turing Machine
4 pages
Ai09 Constraint Satisfaction Problems Part I Post Handout
No ratings yet
Ai09 Constraint Satisfaction Problems Part I Post Handout
44 pages
Bivalent Logic
No ratings yet
Bivalent Logic
6 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
DeMorgan's Theorem and Laws
No ratings yet
DeMorgan's Theorem and Laws
12 pages
Class V Sample Paper 2024-35
No ratings yet
Class V Sample Paper 2024-35
7 pages
450 Questions
No ratings yet
450 Questions
15 pages
Ty BSC Cs Python Chapter No 2
No ratings yet
Ty BSC Cs Python Chapter No 2
22 pages
G-12 Maths Ch-2 Eg & Ex A
No ratings yet
G-12 Maths Ch-2 Eg & Ex A
20 pages
Propositional and First Order Logic: Background Knowledge
No ratings yet
Propositional and First Order Logic: Background Knowledge
72 pages
Function S: (Mathematics)
No ratings yet
Function S: (Mathematics)
26 pages
2nd Pu Mathematics MCQs Passing Package
No ratings yet
2nd Pu Mathematics MCQs Passing Package
19 pages
Fast Fourier Transform: XK Xne K N
No ratings yet
Fast Fourier Transform: XK Xne K N
44 pages
Lecture6 Chapter5 - State Reduction and Assignment
100% (1)
Lecture6 Chapter5 - State Reduction and Assignment
21 pages
Mathematics 7
No ratings yet
Mathematics 7
17 pages
CP Set B
No ratings yet
CP Set B
2 pages
HW1a Propositional Logic
No ratings yet
HW1a Propositional Logic
8 pages
Concept Review - Chapter 1-10
No ratings yet
Concept Review - Chapter 1-10
51 pages
Unit-4.5 Fuzzy Fication and Defuzzification and Applications
No ratings yet
Unit-4.5 Fuzzy Fication and Defuzzification and Applications
7 pages
1105 Pre-Class Assignment Week 1 Class 1 (Functions and Their Multiple Representations)
No ratings yet
1105 Pre-Class Assignment Week 1 Class 1 (Functions and Their Multiple Representations)
5 pages
Question Text: Find The Output Value of N If Input A 200. Procedure XYZ (A: Integer) N: 0 While A 0
No ratings yet
Question Text: Find The Output Value of N If Input A 200. Procedure XYZ (A: Integer) N: 0 While A 0
143 pages
Relations and Functions
No ratings yet
Relations and Functions
2 pages
Mendoza - Experiment No.9
No ratings yet
Mendoza - Experiment No.9
3 pages
2025 2026 Class XI Mathematics Chapter 2 AW
No ratings yet
2025 2026 Class XI Mathematics Chapter 2 AW
6 pages
Understanding Set Theory Basics
100% (1)
Understanding Set Theory Basics
24 pages
Predetive Parse
No ratings yet
Predetive Parse
7 pages

Pattern Matching

Uploaded by

Pattern Matching

Uploaded by

Strings and Pattern Matching

• Brute Force,Rabin-Karp, Knuth-Morris-Pratt

Compared characters are italicized.

Total number of comparisons: M (N-M+1) 5

Total number of comparisons: M

Total number of comparisons: N 7

pattern is M characters long

• Common Rabin-Karp questions:

• To answer some of these questions, we’ll have to get mathematical.

• Furthermore, given x(i) we can compute x(i+1) for the next

• In this way, we never explicitly compute a new value. We

• Let’s say that our alphabet consists of 10 letters.

3*100 + 1*10 + 8*1 = 318

You might also like

3100 + 110 + 8*1 = 318