Data Structures - U5
Data Structures - U5
asia
6 (bucket size for digits of 0: 002, 024, 045, 066, 075, 090)
1 (bucket size for digits of 1: 170)
1 (bucket size for digits of 8: 802)
Time Complexity:
Worst Case Performance O(N log2 N)
Best Case Performance(nearly) O(N log2 N)
Average Case Performance O(N log2 N)
UNIT-V
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
Traversals
• Visit every node in the tree and perform some
operation on it
(ex) print out the data fields of each node
• Three steps to a traversal
1. Visit the current node
2. Traverse its left subtree
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
Traversal Code
public void InOrder(bst_node root) {
// stop the recursion:
if(root == null) {
return;
}
// recursively traverse left subtree:
InOrder(root.leftChild());
// visit the current node:
Visit(root);
// recursively traverse right subtree:
InOrder(root.rightChild());
}
Traversal Examples
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
Structural properties
1. Binary tree property
2. Balance property:
balance of every node is
between -1 and 1
Result:
Worst-case depth is
O(log n)
Ordering property
– Same as for BST
An AVL tree
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
AVL Trees
Named after their inventor Adelson, Velski & Landis, AVL trees are height
balancing binary search tree. AVL tree checks the height of left and right sub-trees
and assures that the difference is not more than 1. This difference is
called Balance Factor.
The length of the longest path to leaves in the left subtree and the length of
the longest path to leaves in the right subtree differ by at most one.
If the difference in the height of left and right sub-trees is more than 1, the tree is balanced using some rotation
techniques.
AVL Rotations
To make itself balanced, an AVL tree may perform four kinds of rotations −
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
Left rotation
Right rotation
Left-Right rotation
Right-Left rotation
First two rotations are single rotations and next two rotations are double rotations. Two have an unbalanced
tree we at least need a tree of height 2. With this simple tree, let's understand them one by one.
Left Rotation
If a tree become unbalanced, when a node is inserted into the right subtree of right subtree, then we perform
single left rotation −
In our example, node A has become unbalanced as a node is inserted in right subtree of A's right subtree. We
perform left rotation by making A left-subtree of B.
Right Rotation
AVL tree may become unbalanced if a node is inserted in the left subtree of left subtree. The tree then needs a
right rotation.
As depicted, the unbalanced node becomes right child of its left child by performing a right rotation.
Left-Right Rotation
Double rotations are slightly complex version of already explained versions of rotations. To understand them
better, we should take note of each action performed while rotation. Let's first check how to perform Left-
Right rotation. A left-right rotation is combination of left rotation followed by right rotation.
State Action
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
A node has been inserted into right subtree of left subtree. This makes C
an unbalanced node. These scenarios cause AVL tree to perform left-right
rotation.
We shall now right-rotate the tree making B new root node of this subtree.
C now becomes right subtree of its own left subtree.
Right-Left Rotation
Second type of double rotation is Right-Left Rotation. It is a combination of right rotation followed by left
rotation.
State Action
A node has been inserted into left subtree of right subtree. This makes
A an unbalanced node, with balance factor 2.
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
Let the longest path from the root to the leaves of an AVL tree T with n nodes be hT(n), which is
called the height of T. Let h(n) be the maximum of hT(n) among AVL trees T with n nodes. We
call this tree the tallest AVL tree, or simply tallest tree. Small examples are given below.
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
h=0
in this tree
b nodes a nodes
h(1) = 0 H(0) = 1
h(2) = 1 H(1) = 2
h(a) = h(b) + 1
h(n) = h(a) + 1
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
Let H(h) be the inverse function of h, that is, H(h(n)) = n. H(h) is the smallest number of nodes
of AVL trees of height h. The solution of H(h) is given by
= (1 + 5 ) / 2, = (1 - 5 ) / 2.
h logn = 1.45 log2n. That is, the height of AVL trees are not worse than completely
balanced trees by 45%.
As insert an item into an AVL tree using the insertion procedure for an binary search tree, we
may violate the condition of balancing. Let markers L, E, R denote the situations at each node
as follows:
After we insert a new item at a leaf, we trace the path back to the root and rotate the tree if
necessary. We have the following two typical situations. Let the marker at x is old and all
others are new.
x rotate y
violation L E
y L xE
A B C
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
inserted
x L rotate zE
y R y E x R
z L
D C
A A B D
B C
inserted
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
x L R y
y E L x
C A
B C
A B
Case 2. Deletion occurred in C. We lost the height by one, and need to keep adjustment
towards root.
x L E y
y L E x
C A
B C
A B
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
Case 3. Deletion occurred in D. We lost the height by one, and need to keep adjustment
towards root.
x L E z
y E y L or E x E or R
L or E or R
B C
D A D
B C
B-trees
The B-tree was invented by Bayer and McCreight in 1972. It maintains balance, i.e., the paths
from the root to leaves are all equal. There are several variations of B-trees. Here we describe
the 2-3 tree in which we maintain items at leaves and internal nodes play the role of guiding
the search.
Example
2, 30
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
2, 9, 19 30, 56,70
2 5 9 12 19 24 30 32 41 56 64 70 85
Internal nodes are round and external nodes (leaves) are square. a node that has i children
is called an i-node. An i-node has i keys, key1, key2 and key3 (if any). Keyi is the smallest key of
the leaves of the i-th subtree of the i-node.
2. Let key1, key2 and key3 (if any) be the keys of the current node.
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
The height of the tree is O(log n) and find(x) takes O(log n) time. Insertion and deletion cause
reorganisation of the tree.
If we insert an item at the bottom and the parent had two children originally, there will
be no reorganisation, although the keys along the path taken by find(x) will be updated.
Example. If we insert 28, it is inserted to the right of 24 and the parent will have key3 = 28.
If the parent gets four children after insertion, it is split into two and reorganisation starts. If the
parent of parent gets four as the results of the splitting, this reorganisation propagates towards
the root. The keys over the path will be updated. Key1 is redundant for find(x), but convenient
for updating keys held by internal nodes. It is obvious that the whole reorganisation takes
O(log n) time.
2, 30, 56
2, 9, 19 30, 41 56, 70
2 5 9 12 19 24 30 32 41 45 56 64 70 85
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
Splay Trees
Introduction
• Splay trees support all the operations of binary trees.
• But they do not guaranteeΟ(log N)worst-case performance.
• Instead, its bounds are amortized, meaning that although individual operations can be
expensive, any sequence of operations is guaranteed to behave as if each operation in the
sequence exhibited logarithmic behavior.
Self-adjusting and Amortized Analysis
• Limitations of balanced search trees:
o Balanced search trees require storing an extra piece of information per node.
o They are complicated to implement, and as a result insertions and deletions are
expensive and potentially error-prone.
o We do not win when easy inputs occur.
• Balanced search trees have a second feature that seems improvable:
o Their worst-case, average-case, and best-case performance are essentially
identical.
o It would be nice if the second access to the same piece of data was cheaper than
the first.
o The 90-10 rule – empirical studies suggest that in practice 90% of the accesses are
to 10% of the data items.
o It would be nice to get easy wins for the 90% case.
Amortized Time Bounds
• Given the above information, what can we do?
• Since the time to access in a binary tree is proportional to the depth of the accessed node,
we can attempt to restructure the tree by moving frequently accessed items toward the
root.
• Even if there are intervening operations, we would expect the node to remain close to the
root and thus be quickly found.
• We could call this strategy the rotate-to-root strategy.
• An application of the rotate-to-root strategy to node 3 is shown in Figure 1.
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
Figure 1 Rotate-to-Root
• When nodes are rotated to the root, other nodes go to lower levels in the tree.
• If we do not have the 90-10 rule, it is possible for a long sequence of bad accesses to
occur.
The Basic Bottom-Up Splay Tree
• A technique called splaying can be used so a logarithmic amortized bound can be
achieved.
• We use rotations such as we’ve seen before.
• The zig case.
o Let X be a non-root node on the access path on which we are rotating.
o If the parent of X is the root of the tree, we merely rotate X and the root as shown
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
• Given the rotations, consider the example in Figure 5, where we are splaying c.
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
Here's an example of insertion into a red-black tree (taken from Cormen, p269).
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
Move x up to its
grandparent, 7.
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
O(logn) time!
Introduction
Pattern matching is to find a pattern, which is relatively small, in a text, which is supposed to be
very large. Patterns and texts can be one-dimensional, or two-dimensional. In the case of one-
dimensional, examples can be text editor and DNA analysis. In the text editor, we have 26
characters and some special symbols, whereas in the DNA case, we have four characters of A, C,
G, and T. In the text editor, a pattern is often a word, whose length is around 10, and the length
of the text is a few hundred up to one million. In the DNA analysis, a gene in a few hundred long
and the human genome is about 3 billion long.
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
1 n
text ababaa
pattern ababbabab
1 m
mismatch
At first the pattern is set to the left end of the text, and matching process starts. After a
mismatch is found, pattern is shifted one place right and a new matching process starts, and so
on. The pattern and text are in arrays pat[1..m] and text[1..n] respectively.
1. j:=1;
3. i:=1;
5. i:=i+1;
6. j:=j+1
7. end;
10. end.
The worst case happens when pat and text are all a’s but b at the end, such as pat = aaaaab and
text = aaaaaaaaaaaaaaaaaaaaaaaaaaaaab. The time is obviously O(mn). On average the
situation is not as bad, but in the next section we introduce a much better algorithm. We call
the operation pat[i]=text[j] a comparison between characters, and measure the complexity of a
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
given algorithm by the number of character comparisons. The rest of computing time is
proportional to this measure.
When we shift the pattern to the right after a mismatch is found at i on the pattern and j on the
text, we did not utilise the matching history for the portion pat[1..i] and
text[j-i+1..j]. If we can get information on how much to shift in advance, we can shift the
pattern more, as shown in the following example.
Example 1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
text a b a b a a b b a b a b b a
pattern a b a b b
a b a b b
a b a b b
a b a b b
After mismatch at 5, there is no point in shifting the pattern one place. On the other hand we
know “ab” repeats itself in the pattern. We also need the condition pat[3] <> pat[5] to ensure
that we do not waste a match at position 5. Thus after shifting two places, we resume matching
at position 5. Now we have a mismatch at position 6. This time shifting two places does not
work, since “ab” repeats itself and we know that shifting two places will invite a mismatch. The
condition pat[1]<>pat[4] is satisfied. Thus we shift pat three places and resume matching at
position 6, and find a mismatch at position 8. For a similar reason to the previous case, we shift
pat three places, and finally we find the pattern at position 9 of the text. We spent 15
comparisons between characters. If we followed Algorithm 1, we would spend 23 comparisons.
Confirm this.
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
The information of how much to shift is given by array h[1..m], which is defined by
h[1] = 0
xxx a
text
xxx c xxx b
pattern pat[i]=b
xxx c
pattern shifted
h[i]
A B
xxx c xxx b
Analysing the pattern
s i
The meaning of h[i] is to maximise the portion A and B in the above figure, and require b<>c.
The value of h[i] is such maximum s. Then in the main matching process, we can resume
matching after we shift the pattern after a mismatch at i on the pattern to position h[i] on the
pattern, and we can keep going with the pointer j on the text. In other words, we need not to
backtrack on the text. The maximisation of such s, (or minimisation of shift), is necessary in
order not to overlook an occurrence of the pattern in the text. The main matching algorithm
follows.
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
1. i:=1; j:=1;
4. i:=i+1; j:=j+1
5. end
f(1) = 0
The definitions of h[i] and f(i) are similar. In the latter we do not care about pat[s]<>pat[i]. The
computation of h is like pattern matching of pat on itself.
x x x x a
x x x x a x x x x b
t i-1 i
h[f(i)]
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
x x
f(i)
x x x x a
x x x x a x x x x b
Algorithm 3. Computation of h
1. t:=0; h[1]:=0;
3. /* t = f(i-1) */
5. t:=t+1;
6. /* t=f(i) */
8. end
The computation of h proceeds from left to right. Suppose t=f(i-1). If pat[t]=pat[i-1], we can
extend the matched portion by one at line 5. If pat[t]<>pat[i-1], by repeating t:=h[t], we
effectively slide the pattern over itself until we find pat[t]=pat[i-1], and we can extend the
matched portion. If pat[i]<>pat[t] at line 7, the position t satisfies the condition for h, and so h[i]
can be set to t. If pat[i]=pat[t], by going one step by h[t], the position will satisfy the condition
for h, and thus h[i] can be set to h[t].
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
Example. pat = a b a b b
a b a b b
a b a b b
a b a b b
a b a b b
Finally we have
i | 1 2 3 4 5
------------------------------
pat | a b a b b
f | 0 1 1 2 3
h | 0 1 0 1 3
The time of Algorithm 2 is clearly O(n), since each character in the text is examined at most
twice, which gives the upper bound on the number of comparisons. Also, whenever a mismatch
is found, the pattern is shifted to the right. The shifting can not take place more than n-m_1
times. The analysis of Algorithm 3 is a little more tricky. Trace the changes on the value of t in
the algorithm. We have a doubly nested loop, one by the outer for and the other by while. The
jntuworldupdates.org Specworld.in
Smartzworld.com Smartworld.asia
value of t can be increased by one at line 5, and m-1 times in total, which we regard as income.
If we get into the while loop, the value of t is decreased, which we regard as expenditure. Since
the total income is m-1, and t can not go to negative, the total number of executions of t:=h[t]
can not exceed m-1. Thus the total time is O(m).
Summarising these analyses, the total time for the KMP algorithm, which includes the pre-
processing of the pattern, is O(m+n), which is linear.
jntuworldupdates.org Specworld.in