Pimpri Chinchwad Education Trust’s
Pimpri Chinchwad College of Engineering and Research,
Ravet, Pune
Department of Computer Engineering
Data Structures and Algorithms
Unit 5: Indexing and Multilevel trees
Class : SE A.Y. 2024-25
Faculty Name: Madhuri Kumbhar, Dipti
Chaudhari
Unit -5
● Indexing and Multiway Trees- Indexing
● Indexing techniques-primary, secondary,dense, sparse
● Multiway search trees
● B-Tree- insertion, deletion
● B+ Tree - insertion, deletion
● Use of B+ tree in Indexing, Trie Tree
Case Study: Heap as a Priority Queue
❑ Disk Structure
❑ How data stored on disk
❑ Indexing
❑ Multilevel Indexing
❑ m-way search trees
❑ B-trees
❑ Insertion and deletion – B Tree
❑ B+-trees
❑ Insertion and deletion – B+ Tree
3
CSE 373 -- AU 2004 -- B-Trees
It designed to work on magnetic disks or other
(direct access) secondary storage devices.
Data Layout on Disk
• Track: one ring
• Sector: one pie-shaped piece.
• Block: intersection of a track and a sector.
5
INDEXING
Basic Concepts
● Indexing mechanisms are used to speed up access to desired data.
○ E.g., author catalog in library
● Search Key - attribute to set of attributes used to look up records in a
file.
● An index file consists of records (called index entries) of the form
search-key pointer
● Index files are typically much smaller than the original file
● Two basic kinds of indices:
○ Ordered indices: search keys are stored in sorted order
○ Hash indices: search keys are distributed uniformly across “buckets”
using a “hash function”.
Index Evaluation Metrics
● Access time
● Insertion time
● Deletion time
● Space overhead
Ordered Indices
● In an ordered index, index entries are stored sorted on the
search key value.
○ E.g., author catalog in library.
● Primary index: in a sequentially ordered file, the index whose
search key specifies the sequential order of the file.
○ type of clustering index
○ The search key of a primary index is usually, but not
necessarily, the primary key.
● Secondary index: an index whose search key specifies an order
different from the sequential order of the file. Also called
non-clustering index.
● Index-sequential file: ordered sequential file with a primary index.
Dense Index Files
● Dense index — Index record appears for every search-key value
in the file.
● E.g. index on ID attribute of instructor relation
Dense Index Files (Cont.)
● Dense index on dept_name, with instructor file sorted on
dept_name
Sparse Index Files
● Sparse Index: contains index records for only some search-
key values.
○ Applicable when records are sequentially ordered on search-key
● To locate a record with search-key value K :
○ Find index record with largest search-key value < K
○ Search file sequentially starting at the record to which the index record
points
Sparse Index Files (Cont.)
● Compared to dense indices:
○ Less space and less maintenance overhead for insertions and deletions.
○ Generally slower than dense index for locating records.
● Good tradeoff: sparse index with an index entry for every block in file,
corresponding to least search-key value in the block.
No. of records per block = block
size/record size
=512/128=4
for 100 records , no. of blocks
required = 100/4 = 25 blocks
No. of entries per
block= 512/16=32
100 entries= 100/32
=3.2=4
So, 4+1 blocks are
required for accessing
1 entry
4 blocks+ a for database
Multilevel Index
● If primary (clustered) index does not fit in memory,
access becomes expensive.
● Solution: treat primary index kept on disk as a
sequential file and construct a sparse index on it.
○ outer index – a sparse index of primary index
○ inner index – the primary index file
● If even outer index is too large to fit in main memory,
yet another level of index can be created, and so on.
● Indices at all levels must be updated on insertion or
deletion from the file.
Multilevel Index (Cont.)
Secondary Index
M-way Trees
● An M-way(multi-way) tree is a tree that has the following
properties:
● Each node in the tree can have at most m children.
● Nodes in the tree have at most (m-1) key fields and
pointers(references) to the children.
Properties of B-Tree
● The various properties of B trees include −
● Every node in a B Tree will hold a maximum of
m children and (m-1) keys, since the order of the
tree is m.
● Every node in a B tree, except root and leaf, can
hold at least m/2 children
● The root node must have minimum two children.
● All t the leaf nodes must be at the same level.
● Creation is bottom up.
● B tree always maintains sorted data.
Basic operation on B-tree
● B-TREE-SEARCH :-Searching in B Tree
● B-TREE-INSERT :-Inserting key in B Tree
● B-TREE-CREATE :-Creating a B Tree
● B-TREE-DELETE :- Deleting a key from B
Tree
Insertion Operation
1. If the tree is empty, allocate a root node and insert the key.
2. Update the allowed number of keys in the node.
3. Search the appropriate node for insertion.
4. If the node is full, follow the steps below.
5. Insert the elements in increasing order.
6. Now, there are elements greater than its limit. So, split at the median.
7. Push the median key upwards and make the left keys as a left child and the right
keys as a right child.
8. If the node is not full, follow the steps below.
9. Insert the node in increasing order.
Algorithm for Inserting an Element
● Start: A node x and a key k to insert.
● Find the position i in the node x where k should be inserted:
○ Locate the first key in x greater than k, or move to the end if no such
key exists.
● If x is a leaf node, directly insert k at position i in sorted order.
● If x is not a leaf node, proceed to the child node at position i:
○ Check if the child node is full (has the maximum number of keys
allowed).
○ If the child is full:
○ Split the child node into two nodes.
○ Move the middle key of the child node up to the parent
node (x).
○ Adjust the position i if k is greater than the promoted
key.
● Recursively call the B-Tree-Insert procedure on the appropriate child node to
continue the insertion.
B-TREE-INSERT ALGORITHM
B-TREE-INSERT(T,k)
1. r 🡨 root[T]
2. If ( n[r] = 2t-1)
3. then s🡨 Allocate-Node()
4. root[T] 🡨 s
5. leaf[s] 🡨 FALSE
6. n[s]🡨0
7. c1[s]🡨r
8. B-TREE-SPLIT-CHILD(s,1,r)
9. B-TREE-INSERT-NONFULL(s,k)
10. else B-TREE-INSERT-NONFULL(r,k)
Constructing a B-tree
• Suppose we start with an empty B-tree and keys arrive in the
following order:1 12 8 2 25 6 14 28 17 7 52 16 48 68
3 26 29 53 55 45
• We want to construct a B-tree of order 5
• The first four items go into the root:
1 2 8 12
• To put the fifth item in the root would violate condition 5
• Therefore, when 25 arrives, pick the middle key to make a
new root
36
Constructing a B-tree (contd.)
1 2 12 25
6, 14, 28 get added to the leaf nodes:
8
1 2 6 12 14 25 28
37
Constructing a B-tree (contd.)
Adding 17 to the right leaf node would over-fill it, so we take the
middle key, promote it (to the root) and split the leaf
8 17
1 2 6 12 14 25 28
7, 52, 16, 48 get added to the leaf nodes
8 17
1 2 6 7 12 14 16 25 28 48 52
38
Constructing a B-tree (contd.)
Adding 68 causes us to split the right most leaf, promoting 48 to the
root, and adding 3 causes us to split the left most leaf, promoting 3
to the root; 26, 29, 53, 55 then go into the leaves
3 8 17 48
1 2 6 7 12 14 16 25 26 28 29 52 53 55 68
Adding 45 causes a split of 25 26 28 29
and promoting 28 to the root then causes the root to split
39
Constructing a B-tree (contd.)
17
3 8 28 48
1 2 6 7 12 14 16 25 26 29 45 52 53 55 68
40
Constructing a B-tree
● Suppose we start with an empty B-tree and keys arrive
in the following order:1 12 8 2 25 6 14 28 17 7 52
16 48 68 3 26 29 53 55 45
● We want to construct a B-tree of degree 5
● The first four items go into the root:
1 2 8 12
● To put the fifth item in the root would violate condition 5
● Therefore, when 25 arrives, pick the middle key to make
a new root
Exercise in Inserting a B-Tree
● Insert the following keys in B-tree when m=4:
● 3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8, 19, 4, 31, 35, 56
● Check your approach with a neighbour and discuss any differences.
42
Deletion from a B-tree
● Deleting an element on a B-tree consists of three main events: searching
the node where the key to be deleted exists, deleting the key and
balancing the tree if required.
● While deleting a tree, a condition called underflow may occur. Underflow
occurs when a node contains less than the minimum number of keys it
should hold.
● The terms to be understood before studying deletion operation are:
1. Inorder Predecessor
The largest key on the left child of a node is called its inorder predecessor.
2. Inorder Successor
The smallest key on the right child of a node is called its inorder successor.
Deletion operation – B tree
Case 1 − If the key to be deleted is in a
leaf node and the deletion does
not violate the minimum key
property, just delete the node.
Case 1 − If the key to be deleted is in a leaf node but the deletion violates the
minimum key property, borrow a key from either its left sibling or right
sibling. In case if both siblings have exact minimum number of keys, merge
the node in either of them.
If both the immediate sibling nodes already have a minimum number of keys, then merge the
node with either the left sibling node or the right sibling node.
This merging is done through the parent node.
Deleting 30 results in the above case.
Case 3 −If the key to be deleted lies in the internal node, the following cases occur.
The internal node, which is deleted, is replaced by an inorder predecessor if the left child has
more than the minimum number of keys.
The internal node, which is deleted, is replaced by an inorder successor if the right child has
more than the minimum number of keys.
If either child has exactly a minimum number of keys then, merge the left and the right children.
Case 3 − If the key to be deleted is in an internal node violating the minimum keys
property, and both its children and sibling have minimum number of keys, merge the
children. Then merge its sibling with its parent.
Operations on B-Tree
Algorithm for Searching an Element
Deletion Complexity
● Best case Time complexity: Θ(log n)
● Average case Space complexity: Θ(n)
● Worst case Space complexity: Θ(n)
Searching Complexity on B Tree
● Worst case Time complexity: Θ(log n)
● Average case Time complexity: Θ(log n)
● Best case Time complexity: Θ(log n)
● Average case Space complexity: Θ(n)
● Worst case Space complexity: Θ(n)
Insertion Operation
1. If the tree is empty, allocate a root node and insert the key.
2. Update the allowed number of keys in the node.
3. Search the appropriate node for insertion.
4. If the node is full, follow the steps below.
5. Insert the elements in increasing order.
6. Now, there are elements greater than its limit. So, split at the median.
7. Push the median key upwards and make the left keys as a left child and the right
keys as a right child.
8. If the node is not full, follow the steps below.
9. Insert the node in increasing order.
Algorithm for Inserting an Element
Applications of B-Trees
● It is used in large databases to access data stored on the disk
● Searching for data in a data set can be achieved in
significantly less time using the B-Tree
● With the indexing feature, multilevel indexing can be
achieved.
● Most of the servers also use the B-tree approach.
● B-Trees are used in CAD systems to organize and search
geometric data.
● B-Trees are also used in other areas such as natural language
processing, computer networks, and cryptography.
Advantages of B-Trees
● B-Trees have a guaranteed time complexity of O(log n) for basic
operations like insertion, deletion, and searching, which makes them
suitable for large data sets and real-time applications.
● B-Trees are self-balancing.
● High-concurrency and high-throughput.
● Efficient storage utilization.
Disadvantages of B-Trees
● B-Trees are based on disk-based data structures and can have a high
disk usage.
● Not the best for all cases.
● For small datasets, the search time in a B-Tree might be slower
compared to a binary search tree, as each node may contain multiple
keys.
Insertion Complexity
● Best case Time complexity: Θ(log n)
● Average case Space complexity: Θ(n)
● Worst case Space complexity: Θ(n)
B+ -Trees
● B+ Tree is simply a balanced binary search tree, in which all data is
stored in the leaf nodes, while the internal nodes store just the
indices.
● Each leaf is at the same height and all leaf nodes have links to
the other leaf nodes.
● The root node always has a minimum of two children.
62
Properties of B+ Trees
1. All data is stored in the leaf nodes, while the internal nodes
store just the indices.
2. Each leaf is at the same height.
3. All leaf nodes have links to the other leaf nodes.
4. The root node has a minimum of two children.
5. Each node except root can have a maximum of m children and
a minimum of m/2 children.
6. Each node can contain a maximum of m-1 keys and a minimum
of ⌈m/2⌉ - 1 keys.
63
B+ -Trees
● Data/Record pointers stored only at the leaf nodes
○ Leaf nodes have an entry for every value of the search field, and a
data pointer to the record if search field is a key field
○ For a nonkey search field, the pointer points to a block containing
pointers to the data file records
● Internal nodes
○ Some search field values from the leaf nodes repeated to guide
search
64
Multilevel Indexing using B+ tree
All these leaf nodes are linked to each other, making up a linked list. B+ trees are an
important concept in DBMS!
66
Difference Between B Tree and B+ Tree
67
Insertion in B+ Tree
B+ Tree visualization: [Link]
1. Every element is inserted into a leaf node. So, go to the appropriate leaf
node.
2. Insert the key into the leaf node in increasing order only if there is no
overflow.
Case 1: Overflow in leaf node
3. Split the leaf node into two nodes.
4. First node contains ceil((m-1)/2) values.
5. Second node contains the remaining values.
6. Copy the smallest search key value from second node to the parent
node.(Right biased)
Case 2: Overflow in non-leaf node
7. Split the non leaf node into two nodes.
8. First node contains ceil(m/2)-1 values.
9. Move the smallest among remaining to the parent. 68
Insertion in B+ Tree
B+ Tree visualization: [Link]
1. Every element is inserted into a leaf node. So, go to the appropriate leaf
node.
2. Insert the key into the leaf node in increasing order only if there is no
overflow.
Case 1: Overflow in leaf node
3. Split the leaf node into two nodes.
4. First node contains ceil((m-1)/2) values.
5. Second node contains the remaining values.
6. Copy the smallest search key value from second node to the parent
node.(Right biased)
Case 2: Overflow in non-leaf node
7. Split the non leaf node into two nodes.
8. First node contains ceil(m/2)-1 values.
9. Move the smallest among remaining to the parent. 69
Example
Insert the following key values 6, 16, 26, 36, 46 on a B+
tree with order = 3.
order(m) : 3
Max Children : m = 3,
Min Children: ceil(m/2) = 2
Max Keys: m - 1 =2
Min Keys: ceil ⌈m/2⌉ - 1 = 1
71
First part contains ceil((3-1)/2) values i.e., only 6. The second node
contains the remaining values i.e., 16 and 26. Then also copy the
smallest search key value from the second node to the parent node
i.e., 16 to the parent node.
First node contains ceil(3/2)-1 values i.e. ’16’.
Move the smallest among remaining to the parent i.e ’26’ will be the new
parent node.
The second node contains the remaining keys i.e ’36’ and the rest of the
leaf nodes remain the same.
Time complexity: O(log n)
Auxiliary Space: O(log n)
Deletion in B+Trees
● Deletion in B+ Trees is just not deletion but it is a
combined process of Searching, Deletion, and
Balancing.
● In the last step of the Deletion Process, it is
mandatory to balance the B+ Trees, otherwise, it
fails in the property of B+ Trees.
Algorithm
● Look to locate the deleted key in the leaf nodes.
● Delete the key and its associated value if the key is discovered
in a leaf node.
● One of the following steps should be taken if the node
underflows (number of keys is less than half the maximum
allowed):
● Get a key by borrowing it from a sibling node if it
contains more keys than the required minimum.
● If the minimal number of keys is met by all of the sibling
nodes, merge the underflow node with one of its siblings
and modify the parent node as necessary.
● Remove all references to the deleted leaf node from the internal
nodes of the tree.
● Remove the old root node and update the new one if the root
Case I
The key to be deleted is present only at the leaf node not in the indexes (or internal
nodes). There are two cases for it:
1. There is more than the minimum number of keys in the node. Simply delete the
key.
2. There is an exact minimum number of keys in the node. Delete the key and borrow a
key from the immediate sibling. Add the median key of the sibling node to the parent.
Case II
The key to be deleted is present
in the internal nodes as well. Then
we have to remove them from the
internal nodes as well. There are the
following cases for this situation.
1. If there is more than the
minimum number of keys in the
node, simply delete the key
from the leaf node and delete
the key from the internal node
as well.
Fill the empty space in the
internal node with the
inorder successor.
2. If there is an exact minimum number of keys in the node, then delete the key
and borrow a key from its immediate sibling (through the parent).
Fill the empty space created in the index (internal node) with the borrowed key.
3. This case is similar to Case II(1) but here, empty space is generated above the immediate
parent node.
After deleting the key, merge the empty space with its sibling.
Fill the empty space in the grandparent node with the inorder successor.
Case III
In this case, the height of the tree gets shrinked. It is a little [Link] 55 from
the tree below leads to this condition. It can be understood in the illustrations below.
Deletion Time Complexity
Time Complexity: O(log N)
Auxiliary Space: O(N)
Advantages and Applications of B+ Tree
Advantages
1. Height of the B+ tree is always balanced and is comparatively lesser than B tree.
2. It takes equal number of disk accesses to fetch records.
3. Keys are used for indexing.
4. Because the data is only stored on the leaf nodes, search queries are faster.
5. Data stored in a B+ tree can be accessed both sequentially and directly.
Applications
6. B+ trees in DBMS plays a useful role by supporting equality and range search.
7. It also facilitates database indexing in DBMS.
8. Another advantage is multilevel indexing.
Trie Tree
A trie is a tree-like data structure whose nodes
store the letters of an alphabet. By structuring the
nodes in a particular way, words and strings can be
retrieved from the structure by traversing down a
branch path of the tree.
Tries and Web Search Engines
• The index of a search engine (collection of all searchable words) is stored
into a compressed trie
• Each leaf of the trie is associated with a word and has a list of pages (URLs)
containing that word, called occurrence list
• The trie is kept in internal memory
• The occurrence lists are kept in external memory and are ranked by
relevance
• Boolean queries for sets of words (e.g., Java and coffee) correspond to set
operations (e.g., intersection) on the occurrence lists
• Additional information retrieval techniques are used, such as
– stopword elimination (e.g., ignore “the” “a” “is”)
– stemming (e.g., identify “add” “adding” “added”)
– link analysis (recognize authoritative pages)
98
Tries and Internet Routers
• Computers on the internet (hosts) are identified by a unique 32-bit IP
(internet protocol) addres, usually written in “dotted-quad-decimal” notation
• E.g., [Link] is [Link]
• Use nslookup on Unix to find out IP addresses
• An organization uses a subset of IP addresses with the same prefix, e.g.,
Brown uses 128.148.*.*, Yale uses 130.132.*.*
• Data is sent to a host by fragmenting it into packets. Each packet carries the
IP address of its destination.
• The internet whose nodes are routers, and whose edges are communication
links.
• A router forwards packets to its neighbors using IP prefix matching rules.
E.g., a packet with IP prefix 128.148. should be forwarded to the Brown
gateway router.
• Routers use tries on the alphabet 0,1 to do prefix matching.
99