Homework 8
Suraj Atluri
Chapter 20
Exercise 20.6.3
For what values of d is the tree T of the previous exercise an
order-d B-tree?
Solution 20.6.3
In the previous exercise, T has been defined as a multi-way tree
with each internal node having exactly 5 to 8 children. We needed
to find for which value pairs (a, b) the tree T is an (a, b)-tree.
Here in (a, b) tree:
• Each internal node must have at least a and at most b children.
Since every internal node of T has five to eight children, T will be
an (a, b) tree if:
•a ≤ 5 (minimum number of children for an internal node)
•b ≥ 8b (Maximum children per internal node)
Thus, any a and b values that satisfy:
5≤a ≤b ≥8
will be an (a, b) tree T.
In this question, we need to find the values of d for which the tree
T of the above question can be an order-d B-tree.
For a B-tree of order d:
• All the internal nodes (apart from the root node) must contain
between d and 2d keys.
• This is equivalent to d+1 to 2d+1 children.
Since each of the internal nodes of T has 5 to 8 children:
• d's minimum value should be such that d+1 = 5 so that nodes
must have at least 5 children.
• The value of 2d+1 should be at least 8 to enable nodes to hold
up to 8 children.
So, we have
d+1 = 5 and 2d+1 ≥ 8Solving these equations:
d = 4 (since d+1 = 5)
2(4)+1 = 9 ≥ 8 (exceeds the most number of children
requirement)
So the unique value of d for which T is order-d B-tree is:
d=4
This will ensure
• Minimum children: 4+1 = 5 • Max children: 2(4)+1 = 9
(including the given max of 8) Therefore, the tree T of the given
exercise is an order-4 B-tree.
Exercise 20.6.21
Suppose you are processing a large number of operations in a
consumer-producer process, such as a buffer for a large media
stream. Describe an external-memory data structure to
implement a queue so that the total number of disk transfers
needed to process a sequence
of n enqueue and dequeue operations is O(n/B).
Solution 20.6.21
In order to use an external-memory queue to facilitate a large
number of enqueue and dequeue operations with O(n/B) total disk
transfers, we employ both disk storage and in-memory buffers.
Data structure used
In-Memory Buffers
• Two buffers are used:
Enqueue Buffer: Stores items temporarily until there are enough
to write a complete block to disk.
Dequeue Buffer: Serves to temporarily hold items read from disk
for dequeuing.
• Both the buffers are of size B, where B represents the size of a
block which can be written or read through a single disk I/O
operation.
Disk Storage (Disk Blocks):
• A queue or list of disk block pointers is maintained to store
items written from the enqueue buffer. Each block contains B
items.
Operations:
• Enqueue: Adds an item to the enqueue buffer. When the buffer
is full, it is written to disk.
• Dequeue: Pops an item from the dequeue buffer. If the buffer is
empty, a block is read from disk.
Pseudo Code
Procedure BufferedExternalQueue()
Initialize:
B ← Number of elements that fit in one disk block
InMemoryEnqueueBuffer ← new empty list of capacity B
InMemoryDequeueBuffer ← new empty list of capacity B
DiskStorage ← empty list to hold block addresses on disk
Operation Enqueue(element):
Add element to InMemoryEnqueueBuffer
If InMemoryEnqueueBuffer is full (size = B) then:
Store InMemoryEnqueueBuffer to disk as a block
Append pointer to that block into DiskStorage
Clear InMemoryEnqueueBuffer
Operation Dequeue():
If InMemoryDequeueBuffer is empty then:
If DiskStorage contains blocks then:
Load the first block from DiskStorage into InMemoryDequeueBuffer
Remove its reference from DiskStorage
Else If InMemoryEnqueueBuffer is not empty then:
Copy InMemoryEnqueueBuffer into InMemoryDequeueBuffer
Reverse InMemoryDequeueBuffer to maintain queue order
Clear InMemoryEnqueueBuffer
Else:
Signal an error: Queue has no elements
Retrieve the front item from InMemoryDequeueBuffer
Remove the retrieved item from the buffer
Return the item
Operation IsEmpty():
Return true if all of the following are empty:
InMemoryEnqueueBuffer, InMemoryDequeueBuffer, and DiskStorage
Here
1.Enqueue Operation:
O Items are pushed onto InMemoryEnqueueBuffer.
O When the buffer is full (i.e., contains B items), the buffer is
flushed out to disk as a new block.
O It minimizes writes by writing blocks in their entirety to disk.
2.Dequeue Operation:
O When InMemoryDequeueBuffer is empty, the function checks
whether there are any blocks in the disk.
O When disk blocks are present, it reads the first block into
InMemoryDequeueBuffer.
O In case there are no disk blocks but objects within the enqueue
buffer, it copies them to InMemoryDequeueBuffer and does the
reverse in order to obtain FIFO behavior.
O Otherwise, if no elements exist, it raises an error stating that
the queue is empty.
Time Complexity
This technique ensures that for n enqueue and dequeue
operations, the overall disk transfers are O(n/B).
Exercise 20.6.22
Imagine that you are trying to construct a minimum spanning tree
for a large network, such as is defined by a popular social
networking website. Based on using Kruskal's algorithm, the
bottleneck is the maintenance of a union-find data structure.
Describe how to use a B-tree to implement a union-find data
structure (from Section 7.2) so that union and find operations
each use at most O(log n/log B) disk transfers each.
Solution 20.6.22
To use a union-find data structure with a B-tree for constructing
minimum spanning tree in a large network, we can use the B-
tree to maintain sets and elements in a manner that minimizes
disk transfers. Since Kruskal's algorithm relies on efficiently
finding and merging sets, a B-tree can help in scalable and
efficient data management by maintaining data in hierarchical
form.
Data Structure Design
• The union-find data structure is implemented in a B-tree.
• Each component is represented by one of the B-tree's nodes,
which holds references to the parent.
• The union operation combines two sets, and the find
operation finds the root of the set of an element.
• All the nodes are stored in a B-tree of branching factor B,
supporting efficient disk access.Operations
Pseudo Code
Procedure BTreeUnionFind()
Initialization:
B ← B-tree branching factor, tuned based on block size
SetTree ← Empty B-tree configured with branching factor B
Operation CreateSet(element):
NewNode ← Allocate a node representing element
NewNode.parent ← element // Self-parented initially
NewNode.rank ← 0 // Initial rank
Insert NewNode into SetTree
Operation LocateRepresentative(element):
Target ← TreeLookup(SetTree, element)
If Target.parent ≠ element then:
Target.parent ← LocateRepresentative(Target.parent) // Apply path
compression
Update Target in SetTree
Return Target.parent
Operation MergeSets(elem1, elem2):
leader1 ← LocateRepresentative(elem1)
leader2 ← LocateRepresentative(elem2)
If leader1 = leader2 then
Return // They are already in the same group
Node1 ← TreeLookup(SetTree, leader1)
Node2 ← TreeLookup(SetTree, leader2)
// Perform union by rank
If Node1.rank > Node2.rank then
Node2.parent ← leader1
Update Node2 in SetTree
Else if Node1.rank < Node2.rank then
Node1.parent ← leader2
Update Node1 in SetTree
Else:
Node2.parent ← leader1
Node1.rank ← Node1.rank + 1
Update both Node1 and Node2 in SetTree
Operation TreeLookup(tree, key):
Return the node associated with key in SetTree
// B-tree guarantees access in O(log n / log B) disk reads
Explanation of Algorithm
1. MakeSet(x):
O Adds a new node for element x to the B-tree.
O Each node is initially its own parent (Node.parent ← x).
O Place this node in the B-tree.
2.Find(x):
O BTreeSearch is used to locate the node for element x within
the B-tree.
O If the parent of the node is not itself, recursively find its
parent and perform path compression.
O Insert the new parent into the B-tree so that it will stay
efficient.
3. Union(x, y):
O Computes the roots of x and y using the Find operation.
O Uses union by rank to decide which root will be the parent of
the other.
O Updates the B-tree to set up the new parent-child
relationships, updating rank information as necessary.
4. BTreeSearch(tree, key):
O Carries out a search operation in the B-tree to locate the
node for a given key.
O Since B-tree has branching factor B, the search utilizes O(log
n / log B) disk transfers.
Time Complexity
Both of these operations (find and union) require O(log n / log
B) disk transfers and are therefore extremely efficient for large
networks where minimizing disk I/O is critical.
Chapter 23
Exercise 23.7.11
What is the longest prefix of the string "cgtacgttcgtacg" that is
also a suffix of this string?
Solution 23.7.11
In order to get the longest prefix of "cgtacgttcgtacg" that is a
suffix, we must compare the start and end of the string:S
1. Length 1: "c" ≠ "g" (no match)
2.Length 2: "cg" = "cg" (match)
3. Length 3: "cgt" ≠ "acg" (no match)
4. Length 4: "cgta" ≠ "tacg" (no match)
5. Length 5: "cgtac" = "cgtac" (match)
6. Length 6: "cgtacg" = "cgtacg" (match)
7. Length 7: "cgtacgt" ≠ "tcgtacg" (no match)
8. Length 8: "cgtacgtt" ≠ "ttcgtacg" (no match)
9. Length 9: "cgtacgttc" ≠ "gttcgtacg" (no match)
10. Length 10: "cgtacgttcg" ≠ "cgttcgtacg" (no match)
11. Length 11: "cgtacgttcgt" ≠ "gcgttcgtacg" (no match)
12. Length 12: "cgtacgttcgta" ≠ "agcgttcgtacg" (no match)
13. Length 13: "cgtacgttcgtac" ≠ "tagcgttcgtacg" (no match)
14. Length 14: "cgtacgttcgtacg" = "cgtacgttcgtacg" (match, but
it's the whole string)
The longest proper prefix that is also a suffix is "cgtacg", with
length 6.
Exercise 23.7.15
Give an example of a text T of length n and a pattern P of
length m that force the brute-force pattern matching algorithm
to have a running time that is Ω(nxm).
Solution 23.7.15
In order to construct an example for which the brute-force
pattern matching algorithm takes time Ω(nm), we need to
construct an example for which the algorithm makes the
largest possible number of comparisons. This will occur when
the pattern almost matches at every position of the text but
not at the last character at all occurrences.
Below is an example which does that:
Text T (of length n):
T = "aaaaaaaaaaaaaaaaaaaaab"
Pattern P (length m):
P = "aaaaab"
In this instance:
• n = 22 (T length)
• m = 6 (length of P)
Let us see how the brute-force algorithm works with this input:
1. The algorithm begins by comparing P and T at position 0:
"aaaaab"
"aaaaaaaaaaaaaaaaaaaaab"
It traverses 5 characters and fails on the 6th.
2. Then it moves P one position and tries again:
" aaaaab"
"aaaaaaaaaaaaaaaaaaaaab"
Again, it succeeds for 5 characters before breaking.
3. This is repeated for each position in T until the final possible
position:
" aaaaab"
"aaaaaaaaaaaaaaaaaaaaab"
At each position (other than the last) the algorithm performs m-
1 correct comparisons and 1 incorrect comparison, in all m
comparisons. The algorithm attempts to match at (n - m + 1)
positions. So, the number of comparisons is:
(n - m + 1) * mIn the example, this is:
(22 - 6 + 1) * 6 = 17 * 6 = 102 comparisons
This is Ω(nm) because:
• With increase in n, the attempts at comparison rise linearly.
• For each try, the amount of comparisons is proportional to m.
This example causes the brute-force algorithm to make its
worst-case number of comparisons, resulting in an Ω(nm)
running time
Exercise 23.7.32
One way to mask a message, M, using a version
of steganography, is to insert random characters into M at
pseudo-random locations so as to expand M into a larger string, C.
For instance, the message,
ILOVEMOM,
could be expanded into
AMIJLONDPVGEMRPIOM.
It is an example of hiding the string, M, in plain sight, since the
characters in M and C are not encrypted. As long as someone
knows where the random characters where inserted, he or she
can recover M from C. The challenge for law enforcement,
therefore, is to prove when someone is using this technique, that
is, to determine whether a string C contains a message M in this
way. Thus, describe an O(n)-time method for detecting if a
string, M, is a subsequence of a string, C, of length n.
Solution 23.7.32
To verify if a string M is a subsequence of a string C of length n in
O(n) time, we can use a two-pointer technique. This will allow us
to scan through C only once, and thus this will be an O(n) efficient
algorithm. This is the way we can do it:
1. Initialize two pointers:
• i: marking the start of string M
•j: indicating the beginning of string C
2. Traverse through C using pointer j:
• If C[j] character == M[i], then i is incremented
• Increase j after each comparison.
3. If i is the last character in M, then we've identified all
characters of M in sequence in C
4. If j traverses the end of C before i traverses the end of M, then
M is not a subsequence of C
Algorithm Psuedo code
Procedure CheckSubsequence(SmallStr, LargeStr)
Initialize:
pos1 ← 0 // Tracks characters in SmallStr
pos2 ← 0 // Tracks characters in LargeStr
While pos2 < length of LargeStr:
If pos1 equals length of SmallStr:
Return True // Entire SmallStr matched
If SmallStr[pos1] equals LargeStr[pos2]:
pos1 ← pos1 + 1 // Advance in SmallStr if characters match
pos2 ← pos2 + 1 // Always move forward in LargeStr
// Final check after traversal
If pos1 equals length of SmallStr:
Return True
Else:
Return False
Time Complexity
Time complexity of this algorithm is O(n) because:
We pass through C only once, from left to right.
We do at most one comparison per cracter in C.
The amount of work is proportional to the size of C.