[go: up one dir, main page]

0% found this document useful (0 votes)
22 views33 pages

Data Structures - U5

The document discusses Binary Search Trees (BST) and AVL trees, detailing their structure, properties, and operations such as insertion, deletion, and traversal. It explains the time complexity of operations, the concept of balance in AVL trees, and the types of rotations used to maintain balance. Additionally, it introduces B-trees and their significance in maintaining balance in data structures.

Uploaded by

Chetna Bhati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views33 pages

Data Structures - U5

The document discusses Binary Search Trees (BST) and AVL trees, detailing their structure, properties, and operations such as insertion, deletion, and traversal. It explains the time complexity of operations, the concept of balance in AVL trees, and the types of rotations used to maintain balance. Additionally, it introduces B-trees and their significance in maintaining balance in data structures.

Uploaded by

Chetna Bhati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Smartzworld.com Smartworld.

asia

6 (bucket size for digits of 0: 002, 024, 045, 066, 075, 090)
1 (bucket size for digits of 1: 170)
1 (bucket size for digits of 8: 802)

Time Complexity:
Worst Case Performance O(N log2 N)
Best Case Performance(nearly) O(N log2 N)
Average Case Performance O(N log2 N)

UNIT-V

Binary Search Trees (BST)


1. Hierarchical data structure with a single reference to root node
2. Each node has at most two child nodes (a left and
a right child)
3. Nodes are organized by the Binary Search property:
• Every node is ordered by some key data field(s)
• For every node in the tree, its key is greater than its
left child’s key and less than its right child’s key

Some BST Terminology


1. The Root node is the top node in the hierarchy
2. A Child node has exactly one Parent node, a Parent node
has at most two child nodes, Sibling nodes share the same
Parent node (ex. node 22 is a child of node 15)
3. A Leaf node has no child nodes, an Interior node has at
least one child node (ex. 18 is a leaf node)
4. Every node in the BST is a Subtree of the BST rooted at
that node

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Implementing Binary Search Trees


Self-referential class is used to build Binary Search Trees
public class BSTNode {
Comparable data;
BSTNode left;
BSTNode right;
public BSTNode(Comparable d) {
data = d; left = right = null;
}
• left refers to the left child
• right refers to the left child
• data field refers to object that implements the Comparable
interface, so that data fields can be compared to order nodes
in the BST
• Single reference to the root of the BST
• All BST nodes can be accessed through root reference
by recursively accessing left or right child nodes

Create a Binary Search Tree


bst_node root = null; // an empty BST
root = new BSTNode(new Integer(35)); // a BST w/1 node
Root.setLeft(new BSTNode(new Integer(22)); // add left // child

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Find a Node into the BST


• Use the search key to direct a recursive binary
search for a matching node
1. Start at the root node as current node
2. If the search key’s value matches the current
node’s key then found a match
3. If search key’s value is greater than current node’s
1. If the current node has a right child, search right
2. Else, no matching node in the tree
4. If search key is less than the current node’s
1. If the current node has a left child, search left
2. Else, no matching node in the tree

Example: search for 45 in the tree


(key fields are show in node rather than in separate obj ref to by data field):
1. start at the root, 45 is greater than 25, search in right subtree
2. 45 is less than 50, search in 50’s left subtree
3. 45 is greater than 35, search in 35’s right subtree
4. 45 is greater than 44, but 44 has no right subtree so 45 is not
in the BST

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Insert Node into the BST


Always insert new node as leaf node
2. Start at root node as current node
3. If new node’s key < current’s key
1. If current node has a left child, search left
2. Else add new node as current’s left child
4. If new node’s key > current’s key
1. If current node has a right child, search right
2. Else add new node as current’s right child

Example: insert 60 in the tree:


1. start at the root, 60 is greater than 25, search in right subtree
2. 60 is greater than 50, search in 50’s right subtree
3. 60 is less than 70, search in 70’s left subtree
4. 60 is less than 66, add 60 as 66’s left child

Traversals
• Visit every node in the tree and perform some
operation on it
(ex) print out the data fields of each node
• Three steps to a traversal
1. Visit the current node
2. Traverse its left subtree
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

3. Traverse its right subtree


• The order in which you perform these three steps
results in different traversal orders:
– Pre-order traversal: (1) (2) (3)
– In-order traversal: (2) (1) (3)
– Post-order traversal: (2) (3) (1)

Traversal Code
public void InOrder(bst_node root) {
// stop the recursion:
if(root == null) {
return;
}
// recursively traverse left subtree:
InOrder(root.leftChild());
// visit the current node:
Visit(root);
// recursively traverse right subtree:
InOrder(root.rightChild());
}

// in main: a call to InOrder passing root


InOrder(root);
// The call stack after the first few
// calls to InOrder(root.leftChild()):

Traversal Examples

InOrder(root) visits nodes in the following order:


4, 10, 12, 15, 18, 22, 24, 25, 31, 35, 44, 50, 66, 70, 90
A Pre-order traversal visits nodes in the following order:
25, 15, 10, 4, 12, 22, 18, 24, 50, 35, 31, 44, 70, 66, 90
A Post-order traversal visits nodes in the following order:
4, 12, 10, 18, 24, 22, 15, 31, 44, 35, 66, 90, 70, 50, 25

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Structural properties
1. Binary tree property
2. Balance property:
balance of every node is
between -1 and 1
Result:
Worst-case depth is
O(log n)
Ordering property
– Same as for BST

An AVL tree

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

The shallowness bound


Let S(h) = the minimum number of nodes in an AVL tree of height h
– If we can prove that S(h) grows exponentially in h, then a tree
with n nodes has a logarithmic height
• Step 1: Define S(h) inductively using AVL property
– S(-1)=0, S(0)=1, S(1)=2
– For h ≥ 2, S(h) = 1+S(h-1)+S(h-2)
• Step 2: Show this recurrence grows really fast
– Similar to Fibonacci numbers
– Can prove for all h, S(h) > φh – 1 where
φ is the golden ratio, (1+√5)/2, about 1.62
– Growing faster than 1.6h is “plenty” exponential

AVL tree operations


• AVL find:
– Same as BST find
• AVL insert:
– First BST insert, then check balance and potentially “fix”
the AVL tree
– Four different imbalance cases
• AVL delete:
– The “easy way” is lazy deletion
– Otherwise, like insert we do the deletion and then have
several imbalance cases

Insert: detect potential imbalance


1. Insert the new node as in a BST (a new leaf)
2. For each node on the path from the root to the new leaf, the
insertion may (or may not) have changed the node’s height
3. So after recursive insertion in a subtree, detect height imbalance
and perform a rotation to restore balance at that node
All the action is in defining the correct rotations to restore balance
Fact that an implementation can ignore:
– There must be a deepest element that is imbalanced after the
insert (all descendants still balanced)
– After rebalancing this deepest node, every node is balanced
– So at most one node needs to be rebalanced

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Fix: Apply “Single Rotation”


• Single rotation: The basic operation we’ll use to rebalance
– Move child of unbalanced node into parent position
– Parent becomes the “other” child (always okay in a BST!)
– Other subtrees move in only way BST allows (next slide)

AVL Trees

Named after their inventor Adelson, Velski & Landis, AVL trees are height
balancing binary search tree. AVL tree checks the height of left and right sub-trees
and assures that the difference is not more than 1. This difference is
called Balance Factor.

The length of the longest path to leaves in the left subtree and the length of

the longest path to leaves in the right subtree differ by at most one.

BalanceFactor = height(left-sutree) − height(right-sutree)

If the difference in the height of left and right sub-trees is more than 1, the tree is balanced using some rotation
techniques.
AVL Rotations
To make itself balanced, an AVL tree may perform four kinds of rotations −

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
Left rotation
Right rotation
Left-Right rotation
Right-Left rotation

First two rotations are single rotations and next two rotations are double rotations. Two have an unbalanced
tree we at least need a tree of height 2. With this simple tree, let's understand them one by one.

Left Rotation
If a tree become unbalanced, when a node is inserted into the right subtree of right subtree, then we perform
single left rotation −

In our example, node A has become unbalanced as a node is inserted in right subtree of A's right subtree. We
perform left rotation by making A left-subtree of B.
Right Rotation
AVL tree may become unbalanced if a node is inserted in the left subtree of left subtree. The tree then needs a
right rotation.

As depicted, the unbalanced node becomes right child of its left child by performing a right rotation.

Left-Right Rotation
Double rotations are slightly complex version of already explained versions of rotations. To understand them
better, we should take note of each action performed while rotation. Let's first check how to perform Left-
Right rotation. A left-right rotation is combination of left rotation followed by right rotation.

State Action

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
A node has been inserted into right subtree of left subtree. This makes C
an unbalanced node. These scenarios cause AVL tree to perform left-right
rotation.

We first perform left rotation on left subtree of C. This makes A, left


subtree of B.

Node C is still unbalanced but now, it is because of left-subtree of left-


subtree.

We shall now right-rotate the tree making B new root node of this subtree.
C now becomes right subtree of its own left subtree.

The tree is now balanced.

Right-Left Rotation
Second type of double rotation is Right-Left Rotation. It is a combination of right rotation followed by left
rotation.

State Action

A node has been inserted into left subtree of right subtree. This makes
A an unbalanced node, with balance factor 2.

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

First, we perform right rotation along C node, making C the right


subtree of its own left subtree B. Now, B becomes right subtree of
A.

Node A is still unbalanced because of right subtree of its right


subtree and requires a left rotation.

A left rotation is performed by making B the new root node of the


subtree. A becomes left subtree of its right subtree B.

The tree is now balanced.

Analysis of AVL trees

Let the longest path from the root to the leaves of an AVL tree T with n nodes be hT(n), which is
called the height of T. Let h(n) be the maximum of hT(n) among AVL trees T with n nodes. We
call this tree the tallest AVL tree, or simply tallest tree. Small examples are given below.

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

h=0

n=1 h=1, n=2 h=2, n=4 h=3, n=7

We have the following recurrence for the general situation.

There are n nodes

in this tree

b nodes a nodes

h(1) = 0 H(0) = 1

h(2) = 1 H(1) = 2

h(a) = h(b) + 1

h(n) = h(a) + 1

n = a+b+1 H(h) = H(h-1) + H(h-2) + 1

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Let H(h) be the inverse function of h, that is, H(h(n)) = n. H(h) is the smallest number of nodes
of AVL trees of height h. The solution of H(h) is given by

H(h) = (  h+3 -  h+3 ) /  5 -1

 = (1 +  5 ) / 2,  = (1 -  5 ) / 2.

From this we have

h  logn = 1.45 log2n. That is, the height of AVL trees are not worse than completely
balanced trees by 45%.

Rotation of AVL trees

As insert an item into an AVL tree using the insertion procedure for an binary search tree, we
may violate the condition of balancing. Let markers L, E, R denote the situations at each node
as follows:

L : left subtree is higher

E : they have equal heights

R : right subtree is higher.

After we insert a new item at a leaf, we trace the path back to the root and rotate the tree if
necessary. We have the following two typical situations. Let the marker at x is old and all
others are new.

x rotate y

violation L E

y L  xE

A B C

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

inserted

x L rotate zE

y R y E x R

z L

D C

A A B D

B C

inserted

Exercise. Invent a rotation mechanism for deletion.

Deletion in an AVL Tree

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Case 1. Deletion occurred in C. After rotation, we can stop.

x L R y

y E L x

C A

B C

A B

Case 2. Deletion occurred in C. We lost the height by one, and need to keep adjustment
towards root.

x L E y

y L E x

C A

B C

A B

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Case 3. Deletion occurred in D. We lost the height by one, and need to keep adjustment
towards root.

x L E z

y E y L or E x E or R

L or E or R

B C

D A D

B C

B-trees
The B-tree was invented by Bayer and McCreight in 1972. It maintains balance, i.e., the paths
from the root to leaves are all equal. There are several variations of B-trees. Here we describe
the 2-3 tree in which we maintain items at leaves and internal nodes play the role of guiding
the search.

An internal node of a 2-3 tree has 2 or 3 children and maintain 2 or 3 keys.

Example

2, 30

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

2, 9, 19 30, 56,70

2, 5 9, 12 19, 24 30, 32, 41 56, 64 70, 85

2 5 9 12 19 24 30 32 41 56 64 70 85

Internal nodes are round and external nodes (leaves) are square. a node that has i children

is called an i-node. An i-node has i keys, key1, key2 and key3 (if any). Keyi is the smallest key of
the leaves of the i-th subtree of the i-node.

The operation find(x) is described as follows:

1. Go to the root and make it the current node.

2. Let key1, key2 and key3 (if any) be the keys of the current node.

3. If x < key2, go to the first child.

4. If key2  x  key3 (if any), go to the second child.

5. If key3  x, go to the third child (if any).

6. Repeat from 2 until current node is a leaf.

7. If x = key of the leaf, report “found”.

8. If x  key, x is to be inserted to the left.

9. If x  key, x is to be inserted to the right.

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

The height of the tree is O(log n) and find(x) takes O(log n) time. Insertion and deletion cause
reorganisation of the tree.

If we insert an item at the bottom and the parent had two children originally, there will
be no reorganisation, although the keys along the path taken by find(x) will be updated.

Example. If we insert 28, it is inserted to the right of 24 and the parent will have key3 = 28.

If the parent gets four children after insertion, it is split into two and reorganisation starts. If the
parent of parent gets four as the results of the splitting, this reorganisation propagates towards
the root. The keys over the path will be updated. Key1 is redundant for find(x), but convenient
for updating keys held by internal nodes. It is obvious that the whole reorganisation takes
O(log n) time.

Example. If we insert 45 into the previous tree, we have the following.

2, 30, 56

2, 9, 19 30, 41 56, 70

2, 5 9, 12 19, 24 30, 32 41, 45 56, 64 70, 85

2 5 9 12 19 24 30 32 41 45 56 64 70 85

Exercise. Invent a reorganisation mechanism for deletion.

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Splay Trees
Introduction
• Splay trees support all the operations of binary trees.
• But they do not guaranteeΟ(log N)worst-case performance.
• Instead, its bounds are amortized, meaning that although individual operations can be
expensive, any sequence of operations is guaranteed to behave as if each operation in the
sequence exhibited logarithmic behavior.
Self-adjusting and Amortized Analysis
• Limitations of balanced search trees:
o Balanced search trees require storing an extra piece of information per node.
o They are complicated to implement, and as a result insertions and deletions are
expensive and potentially error-prone.
o We do not win when easy inputs occur.
• Balanced search trees have a second feature that seems improvable:
o Their worst-case, average-case, and best-case performance are essentially
identical.
o It would be nice if the second access to the same piece of data was cheaper than
the first.
o The 90-10 rule – empirical studies suggest that in practice 90% of the accesses are
to 10% of the data items.
o It would be nice to get easy wins for the 90% case.
Amortized Time Bounds
• Given the above information, what can we do?
• Since the time to access in a binary tree is proportional to the depth of the accessed node,
we can attempt to restructure the tree by moving frequently accessed items toward the
root.
• Even if there are intervening operations, we would expect the node to remain close to the
root and thus be quickly found.
• We could call this strategy the rotate-to-root strategy.
• An application of the rotate-to-root strategy to node 3 is shown in Figure 1.

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Figure 1 Rotate-to-Root

• When nodes are rotated to the root, other nodes go to lower levels in the tree.
• If we do not have the 90-10 rule, it is possible for a long sequence of bad accesses to
occur.
The Basic Bottom-Up Splay Tree
• A technique called splaying can be used so a logarithmic amortized bound can be
achieved.
• We use rotations such as we’ve seen before.
• The zig case.
o Let X be a non-root node on the access path on which we are rotating.
o If the parent of X is the root of the tree, we merely rotate X and the root as shown

Figure 2 Zig Case

o This is the same as a normal single rotation.


• The zig-zag case.
o In this case, X and both a parent P and a grandparent G. X is a right child and P is
a left child (or vice versa).
o This is the same as a double rotation.
o This is shown in Figure 3.

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Figure 4 Zig-zig Case

• Given the rotations, consider the example in Figure 5, where we are splaying c.

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Figure 5 Example of Splay Tree

Basic Splay Tree Operations


• Insertion – when an item is inserted, a splay is performed.
o As a result, the newly inserted item becomes the root of the tree.
• Find – the last node accessed during the search is splayed
o If the search is successful, then the node that is found is splayed and becomes the
new root.
o If the search is unsuccessful, the last node accessed prior to reaching the NULL
pointer is splayed and becomes the new root.
• FindMin and FindMax – perform a splay after the access.
• DeleteMin and DeleteMax – these are important priority queue operations.
o DeleteMin
� Perform a FindMin.
� This brings the minimum to the root, and by the binary search tree
property, there is no left child.
� Use the right child as the new root and delete the node containing the
minimum.
o DeleteMax� Peform a FindMax.
� Set the root to the post-splay root’s left child and delete the post-splay
root.
• Remove
o Access the node to be deleted bringing it to the root.
o Delete the root leaving two subtrees L (left) and R (right).
o Find the largest element in L using a DeleteMax operation, thus the root of L will
have no right child.
o Make R the right child of L’s root.
o As an example, see Figure 6.

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Red Black tree

Here's an example of insertion into a red-black tree (taken from Cormen, p269).

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Here's the original tree ..


Note that in the
following diagrams, the
black sentinel nodes
have been omitted to
keep the diagrams
simple.

The tree insert routine


has just been called to
insert node "4" into the
tree.

This is no longer a red-


black tree - there are two
successive red nodes on
the path
11 - 2 - 7 - 5 - 4

Mark the new node, x,


and it's uncle, y.

y is red, so we have case


1 ...

Change the colours of


nodes 5, 7 and 8.

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Move x up to its
grandparent, 7.

x's parent (2) is still red,


so this isn't a red-black
tree yet.

Mark the uncle, y.

In this case, the uncle is


black, so we have case 2
...

Move x up and rotate


left.

Still not a red-black tree


.. the uncle is black, but
x's parent is to the left ..

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Change the colours of 7


and 11 and rotate right ..

This is now a red-black


tree, so we're finished!

O(logn) time!

Pattern Matching Algorithms

Introduction

Pattern matching is to find a pattern, which is relatively small, in a text, which is supposed to be
very large. Patterns and texts can be one-dimensional, or two-dimensional. In the case of one-
dimensional, examples can be text editor and DNA analysis. In the text editor, we have 26
characters and some special symbols, whereas in the DNA case, we have four characters of A, C,
G, and T. In the text editor, a pattern is often a word, whose length is around 10, and the length
of the text is a few hundred up to one million. In the DNA analysis, a gene in a few hundred long
and the human genome is about 3 billion long.

In the case of two-dimensional, the typical application is a pattern matching in computer


vision. A pattern is about (100, 100) and text typically (1024,1024). Either one-dimensional or
two-dimensional, the text is very large, and therefore a fast algorithm to find the occurrence(s)
of pattern in it is needed. We start from a naive algorithm for one-dimensional.

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

1 n

text ababaa

pattern ababbabab

1 m

mismatch

At first the pattern is set to the left end of the text, and matching process starts. After a
mismatch is found, pattern is shifted one place right and a new matching process starts, and so
on. The pattern and text are in arrays pat[1..m] and text[1..n] respectively.

Algorithm 1. Naive pattern matching algorithm

1. j:=1;

2. while j <= n-m+1 do begin

3. i:=1;

4. while (i<=m) and (pat[i]=text[j]) do begin

5. i:=i+1;

6. j:=j+1

7. end;

8. if i<=m then j:=j-i+2 /* shift the pattern one place right */

9. else write(“found at “, j-i+1)

10. end.

The worst case happens when pat and text are all a’s but b at the end, such as pat = aaaaab and
text = aaaaaaaaaaaaaaaaaaaaaaaaaaaaab. The time is obviously O(mn). On average the
situation is not as bad, but in the next section we introduce a much better algorithm. We call
the operation pat[i]=text[j] a comparison between characters, and measure the complexity of a

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

given algorithm by the number of character comparisons. The rest of computing time is
proportional to this measure.

Knuth-Morris-Pratt algorithm (KMP algorithm)

When we shift the pattern to the right after a mismatch is found at i on the pattern and j on the
text, we did not utilise the matching history for the portion pat[1..i] and

text[j-i+1..j]. If we can get information on how much to shift in advance, we can shift the
pattern more, as shown in the following example.

Example 1.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

text a b a b a a b b a b a b b a

pattern a b a b b

a b a b b

a b a b b

a b a b b

After mismatch at 5, there is no point in shifting the pattern one place. On the other hand we
know “ab” repeats itself in the pattern. We also need the condition pat[3] <> pat[5] to ensure
that we do not waste a match at position 5. Thus after shifting two places, we resume matching
at position 5. Now we have a mismatch at position 6. This time shifting two places does not
work, since “ab” repeats itself and we know that shifting two places will invite a mismatch. The
condition pat[1]<>pat[4] is satisfied. Thus we shift pat three places and resume matching at
position 6, and find a mismatch at position 8. For a similar reason to the previous case, we shift
pat three places, and finally we find the pattern at position 9 of the text. We spent 15
comparisons between characters. If we followed Algorithm 1, we would spend 23 comparisons.
Confirm this.

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

The information of how much to shift is given by array h[1..m], which is defined by

h[1] = 0

h[i] = max { s | (s=0) or (pat[1 .. s-1] = pat[i-s+1 .. i-1] and pat[s]<>pat[i]) }

The situation is illustrated in the following figure.

Main matching process

xxx a
text
xxx c xxx b
pattern pat[i]=b
xxx c
pattern shifted

h[i]

A B
xxx c xxx b
Analysing the pattern

s i

The meaning of h[i] is to maximise the portion A and B in the above figure, and require b<>c.
The value of h[i] is such maximum s. Then in the main matching process, we can resume
matching after we shift the pattern after a mismatch at i on the pattern to position h[i] on the
pattern, and we can keep going with the pointer j on the text. In other words, we need not to
backtrack on the text. The maximisation of such s, (or minimisation of shift), is necessary in
order not to overlook an occurrence of the pattern in the text. The main matching algorithm
follows.

Algorithm 2. Matching algorithm

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

1. i:=1; j:=1;

2. while (i<=m) and (j<=n) do begin

3. while (i>0) and (pat[i]<>text[j] do i:=h[i];

4. i:=i+1; j:=j+1

5. end

6. if i <= m then write(“not found”)

7. else write(“found at”, j-i+1)

Let the function f(i) be defined by

f(1) = 0

f(i) = max{ s | (1 <= s < i) and (pat[1 .. s-1] = pat[i-s+1 .. i-1]) }

The definitions of h[i] and f(i) are similar. In the latter we do not care about pat[s]<>pat[i]. The
computation of h is like pattern matching of pat on itself.

x x x x a

x x x x a x x x x b

t i-1 i

h[f(i)]

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

x x

f(i)

x x x x a
x x x x a x x x x b

Algorithm 3. Computation of h

1. t:=0; h[1]:=0;

2. for i:=2 to m do begin

3. /* t = f(i-1) */

4. while (t>0) and (pat[i-1]<>pat[t] do t:=h[t];

5. t:=t+1;

6. /* t=f(i) */

7. if pat[i]<>pat[t] then h[i]:=t else h[i]:=h[t]

8. end

The computation of h proceeds from left to right. Suppose t=f(i-1). If pat[t]=pat[i-1], we can
extend the matched portion by one at line 5. If pat[t]<>pat[i-1], by repeating t:=h[t], we
effectively slide the pattern over itself until we find pat[t]=pat[i-1], and we can extend the
matched portion. If pat[i]<>pat[t] at line 7, the position t satisfies the condition for h, and so h[i]
can be set to t. If pat[i]=pat[t], by going one step by h[t], the position will satisfy the condition
for h, and thus h[i] can be set to h[t].

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

Example. pat = a b a b b

i = 2, at line 7 t=1, pat[1]<>pat[2], f(2) = 1, h[2]:=1

a b a b b

a b a b b

i = 3, t=1 at line 4. pat[1]<>pat[2], t:=h[1]=0, at line 7 t=1, f(3)=1, pat[1]=pat[3], h[3]:=0

a b a b b

a b a b b

i = 4, t=1 at line 4. pat[3]=pat[1], t:=t+1, t=2. At line 7, pat[4]=pat[2], h[4]:=h[2]=1

i = 5, t=2, f(4)=2. At line 4, pat[4]=pat[2], t:=t+1=3. f(5)=3. At line 7, pat[5]<>pat[3], h[5]:=t=3

Finally we have

i | 1 2 3 4 5

------------------------------

pat | a b a b b

f | 0 1 1 2 3

h | 0 1 0 1 3

The time of Algorithm 2 is clearly O(n), since each character in the text is examined at most
twice, which gives the upper bound on the number of comparisons. Also, whenever a mismatch
is found, the pattern is shifted to the right. The shifting can not take place more than n-m_1
times. The analysis of Algorithm 3 is a little more tricky. Trace the changes on the value of t in
the algorithm. We have a doubly nested loop, one by the outer for and the other by while. The

jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia

value of t can be increased by one at line 5, and m-1 times in total, which we regard as income.
If we get into the while loop, the value of t is decreased, which we regard as expenditure. Since
the total income is m-1, and t can not go to negative, the total number of executions of t:=h[t]
can not exceed m-1. Thus the total time is O(m).

Summarising these analyses, the total time for the KMP algorithm, which includes the pre-
processing of the pattern, is O(m+n), which is linear.

jntuworldupdates.org Specworld.in

You might also like