100% found this document useful (1 vote)

208 views48 pages

Boyer Moore Algorithm: Idan Szpektor

The Boyer-Moore algorithm is a string matching algorithm that improves on naive algorithms by skipping characters when possible. It uses two rules: the bad character rule, which shifts the pattern by more than one character on a mismatch, and the good suffix rule, which uses the matched suffix to determine the shift distance. The algorithm preprocesses the pattern to calculate shift amounts, then searches the text from right to left. It runs in O(n+m) time in the worst case, where n is the pattern length and m is the text length.

Uploaded by

Nischitha Nish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

208 views48 pages

Boyer Moore Algorithm: Idan Szpektor

Uploaded by

Nischitha Nish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 48

Boyer Moore Algorithm

Idan Szpektor
Boyer and Moore
What It’s About
 A String Matching Algorithm

 Preprocess a Pattern P (|P| = n)

 For a text T (| T| = m), find all of the

occurrences of P in T

 Time complexity: O(n + m), but usually sub-

linear
Right to Left (like in Hebrew)
 Matching the pattern from right to left

 For a pattern abc:

↓
T: bbacdcbaabcddcdaddaaabcbcb
P: abc

 Worst case is still O(n m)

The Bad Character Rule (BCR)

 On a mismatch between the pattern and the

text, we can shift the pattern by more than
one place.
Sublinearity!
ddbbacdcbaabcddcdaddaaabcbcb
acabc
↑
BCR Preprocessing

 A table, for each position in the pattern and a

character, the size of the shift. O(n |Σ|) space. O(1)
access time.
1 2 3 4 5
a b a c b:
a 1 1 3 3 3
1 2 3 4 5
b 2 2 2 5
c 4 4

 A list of positions for each character. O(n + |Σ|)

space. O(n) access time, But in total O(m).
BCR - Summary

 On a mismatch, shift the pattern to the right

until the first occurrence of the mismatched
char in P.

 Still O(n m) worst case running time:

T: aaaaaaaaaaaaaaaaaaaaaaaaa
P: abaaaa
The Good Suffix Rule (GSR)

 We want to use the knowledge of the

matched characters in the pattern’s suffix.

 If we matched S characters in T, what is (if

exists) the smallest shift in P that will align a
sub-string of P of the same S characters ?
GSR (Cont…)
 Example 1 – how much to move:
↓
T: bbacdcbaabcddcdaddaaabcbcb
P: cabbabdbab
cabbabdbab
GSR (Cont…)
 Example 2 – what if there is no alignment:
↓
T: bbacdcbaabcbbabdbabcaabcbcb
P: bcbbabdbabc
bcbbabdbabc
GSR - Detailed
 We mark the matched sub-string in T with t
and the mismatched char with x

1. In case of a mismatch: shift right until the

first occurrence of t in P such that the next
char y in P holds y≠x

2. Otherwise, shift right to the largest prefix of

P that aligns with a suffix of t.
Boyer Moore Algorithm
 Preprocess(P)
 k := n
while (k ≤ m) do
 Match P and T from right to left starting at k

 If a mismatch occurs: shift P right (advance k)

by max(good suffix rule, bad char rule).

 else, print the occurrence and shift P right

(advance k) by the good suffix rule.
Algorithm Correctness

 The bad character rule shift never misses a

match

 The good suffix rule shift never misses a

match
Preprocessing the GSR – L(i)

 L(i) – The biggest index j, such that j < n and

prefix P[1..j] contains suffix P[i..n] as a suffix
but not suffix P[i-1..n]

1 2 3 4 5 6 7 8 9 10 11 12 13

P: b b a b b a a b b c a b b
L: 0 0 0 0 0 0 0 0 0 5 9 0 12
Preprocessing the GSR – l(i)

 l(i) – The length of the longest suffix of P[i..n]

that is also a prefix of P

P: b b a b b a a b b c a b b
l: 2 2 2 2 2 2 2 2 2 2 2 1
Using L(i) and l(i) in GSR

 If mismatch occurs at position n, shift P by 1

 If a mismatch occurs at position i-1 in P:

 If L(i) > 0, shift P by n – L(i)
 else shift P by n – l(i)

 If P was found, shift P by n – l(2)

Building L(i) and l(i) – the Z
 For a string s, Z(i) is the length of the longest
sub-string of s starting at i that matches a
prefix of s.

s: b b a c d c b b a a b b c d d
Z: 1 0 0 0 0 3 1 0 0 2 1 0 0 0

 Naively, we can build Z in O(n^2)

From Z to N

 N(i) is the longest suffix of P[1..i] that is also a

suffix of P.
 N(i) is Z(i), built over P reversed.

s: d d c b b a a b b c d c a b b
N: 0 0 0 1 2 0 0 1 3 0 0 0 0 1
Building L(i) in O(n)
 L(i) – The biggest index j < n, such that prefix
P[1..j] contains suffix P[i..n] as a suffix but not
suffix P[i-1..n]

 L(i) – The biggest index j < n such that:

N(j) == | P[i..n] | == n – i + 1

 for i := 1 to n, L(i) := 0
 for j := 1 to n-1
 i := n – N(j) + 1
 L(i) := j
Building l(i) in O(n)
 l(i) – The length of the longest suffix of P[i..n]
that is also a prefix of P

 l(i) – The biggest j <= | P[i..n] | == n – i + 1

such that N(j) == j

 k := 0
 for j := 1 to n-1
 If(N(j) == j), k := j
 l(n – j + 1) := k
Building Z in O(n)

 For calculating Z(i), we want to use the

previously calculated Z(1)…Z(i-1)

 For each I we remember the right most Z(j):

j, such that j < i and j + Z(j) >= k + Z(k), for all
k<i
Building Z in O(n) (Cont…)

↑ ↑ ↑ ↑
 S i’ j i

 If i < j + Z(j), s[i … j + Z(j) - 1] appeared previously,

starting at i’ = i – j + 1.
 Z(i’) < Z(j) – (i - j) ?
Building Z in O(n) (Cont…)
 For Z(2) calculate explicitly
 j := 2, i := 3
 While i <= |s|:
 if i >= j + Z(j), calculate Z(i) explicitly
 else
 Z(i) := Z(i’)

 If Z(i’) >= Z(j) – (i - j), calculate Z(i) tail

explicitly
 If j + Z(j) < i + Z(i), j := i
Building Z in O(n) - Analysis

 The algorithm builds Z correctly

 The algorithm executes in O(n)

 A new character is matched only once
 All other operations are in O(1)
Boyer Moore Worst Case Analysis
 Assume P consists of n copies of a single
char and T consists of m copies of the same
char:
T: aaaaaaaaaaaaaaaaaaaaaaaaa
P: aaaaaa

 Boyer Moore Algorithm runs in Θ(m n) when

finding all the matches
The Galil Rule
 In a specific matching phase, We mark with k
the position in T of the right end of P. We
mark with s the position of last matched char
in this phase.
s k k’
T: bbacdcbaabcddcdaddaaabcbcb
P: abaab
abaab
The Galil Rule (Cont…)
 All the chars in position s < j ≤ k are known to
be matching. The algorithm doesn’t need to
check them.

 An extended Boyer Moore algorithm with the

Galil rule runs in O(m + n) worst case (even
without the bad-character rule).
Don’t Sleep Yet…
O(n + m) proof - Outline
 Preprocess in O(n) – already proved

1. Properties of strings
2. Proof of search in O(m) if P is not in T, using
only the good suffix rule.
3. Proof of search in O(m) even if P is in T,
adding the Galil rule.
Properties of Strings
 If for two strings δ, γ: δγ = γδ then there is a
string β such that δ = βi and γ = βj, i, j > 0
- Proof by induction

 Definition: A string s is semiperiodic with

period β if s consists of a non-empty suffix of
β (possibly the entire β) followed by one or
more complete copies of β.

β’ β β β
Properties of Strings (Cont…)

 A string is prefix semiperiodic if it contains

one or more complete copies of β followed by
a non-empty prefix of β.

 A string is prefix semiperiodic iff it is

semiperiodic with the same length period
Lemma 1
 Suppose P occurs in T starting at position p
and also at position q, q > p. If q – p ≤  n/2 
then P is semiperiodic with period
α = P[n-(q-p)+1…n]
p

α’ α α α
q

α’ α α α
Proof - when P is Not Found in T

 We have R rounds during the search.

 After each round the good suffix rule decides

on a right shift of si chars.

 Σsi ≤ m

 We shall use Σsi as an upper bound.

Proof (Cont…)
 For each round we count the matched chars
by:
 fi – the number of chars matched for the first
time
 gi –the number of chars already matched in
previous rounds.

 Σfi = m
 We want to prove that gi ≤ 3si ( Σgi ≤ 3m).
Proof (Cont…)
 Each round don’t find P  it matched a
substring ti and one bad char xi in T (xiti  T)

T: bbacdcbaabcbbabdbabcaabcbcb
P: bdbabc

 |ti|+1 ≤ 3si  gi ≤ 3si (because gi + fi = |ti|+1)

 For the rest of the proof we assume that for
the specific round i: |ti| + 1 > 3si
Lemma 2 (|ti| + 1 > 3si)
 In round i we look at the matched suffix of P,
marked P*. P* = yi ti, yi≠ xi.

 Both P* and ti are semiperiodic with period α

of length si and hence with minimal length
period β, α = βk.

 Proof: by Lemma 1.
Lemma 3 (|ti| + 1 > 3si)
 Suppose P overlapped ti during round i. We
shall examine in what ways could P overlap ti
in previous rounds.

 In any round h < i, the right end of P could not

have been aligned with the right end of any
full copy of β in ti.
- proof:
 Both round h and i fail at char xi
 two cases of possible shift after round h are invalid
Lemma 4 (|ti| + 1 > 3si)
 In round h < i, P can correctly match at most |
β|-1 chars in ti.

 By Lemma 3, P is not aligned with a right end of ti in phase h.
 Thus if it matched |β| chars or more there is a suffix γ of β
followed by a prefix δ of β such that δ γ = γ δ.
 By the string properties there is a substring μ such that β = μk,
k>1.
 This contradicts the minimal period size property of β.
Lemma 5 (|ti| + 1 > 3si)

 If in round h < i the right end of P is aligned

with a char in ti, it can only be aligned with
one of the following:
 One of the left-most |β|-1 chars of ti

 One of the right-most |β| chars of ti

-proof:
 If not, By Lemma 3,4, max |β|-1 chars are matched and only
from the middle of a β copy, while there are at least |β|
 A shift cannot pass the right end of that β copy
Proof (Cont…)

 If |ti| + 1 > 3si then gi ≤ 3si


 Using Lemma 5, in previous rounds we could match only the
bad char xi, the last |β|-1 chars in ti or start from the first |β| right
chars in ti.
 In the last case, using Lemma 4, we can only match up to |β|-1
chars
 in total we could previously match:
gi = 1 + |β|-1 + (|β| + |β|-1) ≤ 3|β| ≤ 3si
Proof - Final

 Number of matches = ∑(fi + gi) =

∑fi + ∑gi ≤ m + ∑3si ≤ m + 3m = 4m
Proof - when P is Found in T

 Split the rounds to two groups:

 “match” rounds –an occurrence of P in T was
found.
 “mismatch” rounds –P was not found in T.

 we have proved O(m) for “mismatch” rounds.

Proof (Cont…)
 After P was found in T, P will be shifted by a
constant length s. (s = n – l(2)).

 |n| + 1 ≤ 3s 
∑ matches in round i ≤ ∑3s ≤ m

 For the rest of the proof we assume that:

|n| + 1 > 3s
Proof (|n| + 1 > 3s)
 By Lemma 1, P is semiperiodic with minimal
length period β, |β| = s.

 If round i+1 is also a “match” round then, by

the Galil rule, only the new |β| chars are
compared.

 A contiguous series of “match” rounds, i…i+k

is called a “run”.
Proof (|n| + 1 > 3s)

 ∑ The length of a “run”, not including chars

that where already matched in previous “runs”
≤ m

 How many chars in a “run” where already

matched in previous “runs”?
Lemma (|n| + 1 > 3s)
 Suppose k-1 was a “match” round and k is a
“mismatch” round that ends the “run”.
 If k’ > k is the first “match” round then it
overlaps at most |β|-1 chars with the previous
“run” (ended by round k-1).

 The left end of P at round k’ cannot be aligned with the left end
of a full copy of |β| at round k-1.
 As a result, P cannot overlap |β| chars or more with round k-1.
Proof (|n| + 1 > 3s)
 By the Lemma and because the shift after
every “match” round is of |β|, only the first
round of a “run” can overlap, and only with
the last previous “run”.

 ∑ The length of the chars that where already
matched in previous “runs” ≤ m
Proof (|n| + 1 > 3s) - Final
 ∑ The length of a “run” =
∑ The length of a “run”, not including chars
that where already matched in previous “runs”
+
∑ The length of the chars that where already
matched in previous “runs” ≤
m+m

Argusoft Program Analyst 100 Questions With Answers FULL
100% (1)
Argusoft Program Analyst 100 Questions With Answers FULL
10 pages
Algorithms and Data Structures Lecture Slides: Asymptotic Notations and Growth Rate of Functions, Brassard Chap. 3
No ratings yet
Algorithms and Data Structures Lecture Slides: Asymptotic Notations and Growth Rate of Functions, Brassard Chap. 3
66 pages
ML Unit 2
No ratings yet
ML Unit 2
25 pages
Tool Life Monitoring FANUC
No ratings yet
Tool Life Monitoring FANUC
3 pages
TPS Buy Sell Signal Indicator
No ratings yet
TPS Buy Sell Signal Indicator
2 pages
Stock Market Forecasting Using Deep Learning and Technical Analysis A Systematic Review
No ratings yet
Stock Market Forecasting Using Deep Learning and Technical Analysis A Systematic Review
11 pages
DUI0472M Armcc User Guide
No ratings yet
DUI0472M Armcc User Guide
1,002 pages
Data Structures Using C: Example 4.13
No ratings yet
Data Structures Using C: Example 4.13
5 pages
Data Structures 2
No ratings yet
Data Structures 2
82 pages
Unit 1 Flow 2 - Rounding
No ratings yet
Unit 1 Flow 2 - Rounding
76 pages
String Searching Algorithms Slides
100% (1)
String Searching Algorithms Slides
102 pages
Lecture 9 Prolog Programming
No ratings yet
Lecture 9 Prolog Programming
53 pages
Lab Final 9-7-19
100% (1)
Lab Final 9-7-19
144 pages
C++ Language - C++ Tutorials
No ratings yet
C++ Language - C++ Tutorials
168 pages
Beyond The Kalman FilterParticle Filters For Tracking Applications
100% (1)
Beyond The Kalman FilterParticle Filters For Tracking Applications
47 pages
Intro To Crypto and Cryptocurrencies
No ratings yet
Intro To Crypto and Cryptocurrencies
74 pages
Lect 3
No ratings yet
Lect 3
35 pages
PLplot-5 3 1
No ratings yet
PLplot-5 3 1
178 pages
Using ML and Data-Mining Techniques in Automatic Vulnerability Software Discovery
No ratings yet
Using ML and Data-Mining Techniques in Automatic Vulnerability Software Discovery
18 pages
Characteristics of Data Structures
No ratings yet
Characteristics of Data Structures
3 pages
Text Processing (Complete)
No ratings yet
Text Processing (Complete)
100 pages
Lecture 13 - Backtracking
No ratings yet
Lecture 13 - Backtracking
25 pages
ML Project (1) Final
No ratings yet
ML Project (1) Final
15 pages
Namma Kalvi 11th Computer Science Practical Hand Book English Medium
No ratings yet
Namma Kalvi 11th Computer Science Practical Hand Book English Medium
24 pages
Process Management: Processes Threads Process Synchronization CPU Scheduling Deadlocks
No ratings yet
Process Management: Processes Threads Process Synchronization CPU Scheduling Deadlocks
38 pages
Unit 2 - Data Structure - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Data Structure - WWW - Rgpvnotes.in
22 pages
Dpco QB
No ratings yet
Dpco QB
14 pages
Binary Search Tree
No ratings yet
Binary Search Tree
20 pages
Rice and Climate Change
No ratings yet
Rice and Climate Change
37 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
Tree Traversals (Inorder, Preorder and Postorder)
No ratings yet
Tree Traversals (Inorder, Preorder and Postorder)
4 pages
Program Design and Algorithm Analysis
No ratings yet
Program Design and Algorithm Analysis
50 pages
Symmetric Crypto
100% (1)
Symmetric Crypto
12 pages
Intermediate Code Generation
No ratings yet
Intermediate Code Generation
22 pages
Lecture 40 Boyer Moore Algorithm
100% (1)
Lecture 40 Boyer Moore Algorithm
13 pages
Boyer-Moore String Search: - How Does It Work? - Examples - Complexity - Acknowledgements
100% (1)
Boyer-Moore String Search: - How Does It Work? - Examples - Complexity - Acknowledgements
14 pages
Boyer Moore
100% (1)
Boyer Moore
19 pages
Modern Crypto 18 Homework 2 Solution
No ratings yet
Modern Crypto 18 Homework 2 Solution
5 pages
In-Class Assignment 4 Regular, Complete, and Strongly-Directed Graphs
No ratings yet
In-Class Assignment 4 Regular, Complete, and Strongly-Directed Graphs
4 pages
Student Worksheet Intro To Coding
No ratings yet
Student Worksheet Intro To Coding
5 pages
Study of Van Emde Boas Tree With Application To Dijkstra: Advanced Problem Solving
No ratings yet
Study of Van Emde Boas Tree With Application To Dijkstra: Advanced Problem Solving
16 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Van Emde Boas Tree
No ratings yet
Van Emde Boas Tree
27 pages
OOPS Notes Unit-6
No ratings yet
OOPS Notes Unit-6
15 pages
AI Basic MCQ Worksheet
No ratings yet
AI Basic MCQ Worksheet
3 pages
5Y Modules List - 5Y Modules
No ratings yet
5Y Modules List - 5Y Modules
3 pages
Data Structures and Algorithms - Linked Lists
No ratings yet
Data Structures and Algorithms - Linked Lists
16 pages
Lecture 4: Divide and Conquer: Van Emde Boas Trees
No ratings yet
Lecture 4: Divide and Conquer: Van Emde Boas Trees
7 pages
Minimum Spanning Trees
No ratings yet
Minimum Spanning Trees
19 pages
Image Enhancement Techniques
No ratings yet
Image Enhancement Techniques
15 pages
Segmentation and Object Recognition Using Edge Detection Techniques
No ratings yet
Segmentation and Object Recognition Using Edge Detection Techniques
9 pages
Datesheet
No ratings yet
Datesheet
3 pages
CSE245 - Algorithms: Single Source Shortest Path (Dijkstra's Algorithm)
No ratings yet
CSE245 - Algorithms: Single Source Shortest Path (Dijkstra's Algorithm)
39 pages
Van Emde Boas Trees
No ratings yet
Van Emde Boas Trees
5 pages
Boyer Moore Algorithm
No ratings yet
Boyer Moore Algorithm
16 pages
String Matching Algorithm
100% (1)
String Matching Algorithm
14 pages
Android Application For Crop Yield Prediction and Crop Disease Detection
No ratings yet
Android Application For Crop Yield Prediction and Crop Disease Detection
4 pages
Password Generation
No ratings yet
Password Generation
7 pages
Programming in C 2019
No ratings yet
Programming in C 2019
2 pages
How Artificial Intelligence Transforming Future of Nursing
No ratings yet
How Artificial Intelligence Transforming Future of Nursing
3 pages
DS Lab 12 - The DSW Algorithm
50% (2)
DS Lab 12 - The DSW Algorithm
14 pages
Morphological PCB
No ratings yet
Morphological PCB
5 pages
Floyd Warshall Algorithm PDF
100% (1)
Floyd Warshall Algorithm PDF
5 pages
Xpbctbxabpqxctbpg Abxab: The Boyer-Moore Algorithm Right-To-Left Scan
No ratings yet
Xpbctbxabpqxctbpg Abxab: The Boyer-Moore Algorithm Right-To-Left Scan
5 pages
CCC 101 Computer Progamming
No ratings yet
CCC 101 Computer Progamming
2 pages
Python & Leetcode - The Ultimate Interview Bootcamp: Strings
No ratings yet
Python & Leetcode - The Ultimate Interview Bootcamp: Strings
3 pages
Segmentation
100% (1)
Segmentation
51 pages
Mathematical Model For String Pattern Matching Algorithm (Boyer-Moore's Algorithm)
No ratings yet
Mathematical Model For String Pattern Matching Algorithm (Boyer-Moore's Algorithm)
5 pages
Pattern Matching 2
No ratings yet
Pattern Matching 2
46 pages
DBMS Assignment Question
No ratings yet
DBMS Assignment Question
5 pages
Divide and Conquer For Convex Hull
100% (1)
Divide and Conquer For Convex Hull
8 pages
Heap Sort
No ratings yet
Heap Sort
11 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Logistic Regression Model - A Review
No ratings yet
Logistic Regression Model - A Review
5 pages
Convex Hull Algorithms
No ratings yet
Convex Hull Algorithms
4 pages
Comparative Analysis of Brute Force and Boyer Moore Algorithms in Word Suggestion Search
No ratings yet
Comparative Analysis of Brute Force and Boyer Moore Algorithms in Word Suggestion Search
5 pages
WWW Personal Kent Edu Rmuhamma Algorithms MyAlgorithms Sort
100% (1)
WWW Personal Kent Edu Rmuhamma Algorithms MyAlgorithms Sort
20 pages
Tutorial Sheet 1-Introduction To Algorithm-Software Engineering
No ratings yet
Tutorial Sheet 1-Introduction To Algorithm-Software Engineering
9 pages
Introduction To Keras!: Vincent Lepetit!
No ratings yet
Introduction To Keras!: Vincent Lepetit!
33 pages
Knuth-Morris-Pratt Algorithm KENT
No ratings yet
Knuth-Morris-Pratt Algorithm KENT
4 pages
Red - Black Tree
No ratings yet
Red - Black Tree
12 pages
Class 10 Computer
No ratings yet
Class 10 Computer
2 pages
Automobile
No ratings yet
Automobile
15 pages
Analysis of Algorithms: Recurrences
No ratings yet
Analysis of Algorithms: Recurrences
32 pages
Unit 5 - Compiler Design - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Compiler Design - WWW - Rgpvnotes.in
20 pages
Chapter 4 Divide and Conquer
No ratings yet
Chapter 4 Divide and Conquer
17 pages
Red Black Tree Material
No ratings yet
Red Black Tree Material
5 pages
Outline and Reading: Tries 4/1/2003 9:02 AM
No ratings yet
Outline and Reading: Tries 4/1/2003 9:02 AM
3 pages
DSA DSA: Questions For MAANG Interviews Questions For MAANG Interviews
No ratings yet
DSA DSA: Questions For MAANG Interviews Questions For MAANG Interviews
21 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)

Boyer Moore Algorithm: Idan Szpektor

Uploaded by

Boyer Moore Algorithm: Idan Szpektor

Uploaded by

Boyer Moore Algorithm

 Preprocess a Pattern P (|P| = n)

 For a text T (| T| = m), find all of the

 Time complexity: O(n + m), but usually sub-

 For a pattern abc:

 Worst case is still O(n m)

 On a mismatch between the pattern and the

 A table, for each position in the pattern and a

 A list of positions for each character. O(n + |Σ|)

 On a mismatch, shift the pattern to the right

 Still O(n m) worst case running time:

 We want to use the knowledge of the

 If we matched S characters in T, what is (if

1. In case of a mismatch: shift right until the

2. Otherwise, shift right to the largest prefix of

 If a mismatch occurs: shift P right (advance k)

 else, print the occurrence and shift P right

 The bad character rule shift never misses a

 The good suffix rule shift never misses a

 L(i) – The biggest index j, such that j < n and

 l(i) – The length of the longest suffix of P[i..n]

 If mismatch occurs at position n, shift P by 1

 If a mismatch occurs at position i-1 in P:

 If P was found, shift P by n – l(2)

 Naively, we can build Z in O(n^2)

 N(i) is the longest suffix of P[1..i] that is also a

 L(i) – The biggest index j < n such that:

 l(i) – The biggest j <= | P[i..n] | == n – i + 1

 For calculating Z(i), we want to use the

 For each I we remember the right most Z(j):

 If i < j + Z(j), s[i … j + Z(j) - 1] appeared previously,

 If Z(i’) >= Z(j) – (i - j), calculate Z(i) tail

 The algorithm builds Z correctly

 The algorithm executes in O(n)

 Boyer Moore Algorithm runs in Θ(m n) when

 An extended Boyer Moore algorithm with the

 Definition: A string s is semiperiodic with

 A string is prefix semiperiodic if it contains

 A string is prefix semiperiodic iff it is

 We have R rounds during the search.

 After each round the good suffix rule decides

 We shall use Σsi as an upper bound.

 |ti|+1 ≤ 3si  gi ≤ 3si (because gi + fi = |ti|+1)

 Both P* and ti are semiperiodic with period α

 In any round h < i, the right end of P could not

 If in round h < i the right end of P is aligned

 One of the right-most |β| chars of ti

 If |ti| + 1 > 3si then gi ≤ 3si

 Number of matches = ∑(fi + gi) =

 Split the rounds to two groups:

 we have proved O(m) for “mismatch” rounds.

 For the rest of the proof we assume that:

 If round i+1 is also a “match” round then, by

 A contiguous series of “match” rounds, i…i+k

 ∑ The length of a “run”, not including chars

 How many chars in a “run” where already

You might also like