0% found this document useful (0 votes)

505 views12 pages

Unit 3 Storage Strategies Indices B-Trees Hashing

Unit 3 Storage strategies Indices B-trees hashing

Uploaded by

Sk md Abdul bari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

505 views12 pages

Unit 3 Storage Strategies Indices B-Trees Hashing

Unit 3 Storage strategies Indices B-trees hashing

Uploaded by

Sk md Abdul bari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Unit 3: Storage strategies: Indices, B-trees, hashing.

Indexing in DBMS:

 Indexing is used to optimize the performance of a database by minimizing

the number of disk accesses required when a query is processed.
 The index is a type of data structure. It is used to locate and access the data
in a database table quickly.

Index structure:

Indexes can be created using some database columns.

 The first column of the database is the search key that contains a copy of
the primary key or candidate key of the table. The values of the primary
key are stored in sorted order so that the corresponding data can be
accessed easily.
 The second column of the database is the data reference. It contains a set
of pointers holding the address of the disk block where the value of the
particular key can be found.
Ordered indices:
The indices are usually sorted to make searching faster. The indices which are sorted are known as
ordered indices.

Example: Suppose we have an employee table with thousands of record and each of which is 10
bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student with ID-543.

o In the case of a database with no index, we have to search the disk block from starting till it
reaches 543. The DBMS will read the record after reading 543*10=5430 bytes.
o In the case of an index, we will search using indexes and the DBMS will read the record after
reading 542*2= 1084 bytes which are very less compared to the previous case.

Primary Index
o If the index is created on the basis of the primary key of the table, then it is known as primary
indexing. These primary keys are unique to each record and contain 1:1 relation between the
records.
o As primary keys are stored in sorted order, the performance of the searching operation is
quite efficient.
o The primary index can be classified into two types: Dense index and Sparse index.

Dense index
o The dense index contains an index record for every search key value in the data file. It makes
searching faster.
o In this, the number of records in the index table is same as the number of records in the main
table.
o It needs more space to store index record itself. The index records have the search key and a
pointer to the actual record on the disk.

Sparse index
o In the data file, index record appears only for a few items. Each item points to a block.
o In this, instead of pointing to each record in the main table, the index points to the records
in the main table in a gap.
Clustering Index
o A clustered index can be defined as an ordered data file. Sometimes the index is created on
non-primary key columns which may not be unique for each record.
o In this case, to identify the record faster, we will group two or more columns to get the unique
value and create index out of them. This method is called a clustering index.
o The records which have similar characteristics are grouped, and indexes are created for these
group.

Example: suppose a company contains several employees in each department. Suppose we use a clustering
index, where all employees which belong to the same Dept_ID are considered within a single cluster, and
index pointers point to the cluster as a whole. Here Dept_Id is a non-unique key.
The previous schema is little confusing because one disk block is shared by records which belong to the
different cluster. If we use separate disk block for separate clusters, then it is called better technique.

Secondary Index
In the sparse indexing, as the size of the table grows, the size of mapping also grows. These
mappings are usually kept in the primary memory so that address fetch should be faster. Then the
secondary memory searches the actual data based on the address got from mapping. If the mapping
size grows, then fetching the address itself becomes slower. In this case, the sparse index will not be
efficient. To overcome this problem, secondary indexing is introduced.

In secondary indexing, to reduce the size of mapping, another level of indexing is introduced. In this
method, the huge range for the columns is selected initially so that the mapping size of the first level
becomes small. Then each range is further divided into smaller ranges. The mapping of the first level
is stored in the primary memory, so that address fetch is faster. The mapping of the second level
and actual data are stored in the secondary memory (hard disk).
For example:

o If you want to find the record of roll 111 in the diagram, then it will search the highest entry
which is smaller than or equal to 111 in the first level index. It will get 100 at this level.
o Then in the second index level, again it does max (111) <= 111 and gets 110. Now using the
address 110, it goes to the data block and starts searching each record till it gets 111.
o This is how a search is performed in this method. Inserting, updating or deleting is also done
in the same manner.

B+ Tree
o The B+ tree is a balanced binary search tree. It follows a multi-level index format.
o In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf nodes
remain at the same height.
o In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support
random access as well as sequential access

Structure of B+ Tree
o In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree is of the
order n where n is fixed for every B+ tree.
o It contains an internal node and leaf node.
Internal node
o An internal node of the B+ tree can contain at least n/2 record pointers except the root node.
o At most, an internal node of the tree contains n pointers.

Leaf node
o The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values.
o At most, a leaf node contains n record pointer and n key values.
o Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.

Searching a record in B+ Tree

Suppose we have to search 55 in the below B+ tree structure. First, we will fetch for the intermediary
node which will direct to the leaf node that can contain a record for 55.

So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the end, we
will be redirected to the third leaf node. Here DBMS will perform a sequential search to find 55.

B+ Tree Insertion
Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf node after 55.
It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert 60 there.

In this case, we have to split the leaf node, so that it can be inserted into tree without affecting the
fill factor, balance and order
The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split the
leaf node of the tree in the middle so that its balance is not altered. So we can group (50, 55) and
(60, 65, 70) into 2 leaf nodes.

If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have 60
added to it, and then we can have pointers to a new leaf node.

This is how we can insert an entry when there is overflow. In a normal scenario, it is very easy to find the node
where it fits and then place it in that leaf node.

B+ Tree Deletion
Suppose we want to delete 60 from the above example. In this case, we have to remove 60 from the
intermediate node as well as from the 4th leaf node too. If we remove it from the intermediate node,
then the tree will not satisfy the rule of the B+ tree. So we need to modify it to have a balanced tree.

After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows:

Hashing in DBMS
In a huge database structure, it is very inefficient to search all the index values and reach the desired
data. Hashing technique is used to calculate the direct location of a data record on the disk without
using index structure.

In this technique, data is stored at the data blocks whose address is generated by using the hashing
function. The memory location where these records are stored is known as data bucket or data
blocks.

In this, a hash function can choose any of the column value to generate the address. Most of the
time, the hash function uses the primary key to generate the address of the data block. A hash
function is a simple mathematical function to any complex mathematical function. We can even
consider the primary key itself as the address of the data block. That means each row whose address
will be the same as a primary key stored in the data block.
The above diagram shows data block addresses same as primary key value. This hash function can also be a
simple mathematical function like exponential, mod, cos, sin, etc. Suppose we have mod (5) hash function
to determine the address of the data block. In this case, it applies mod (5) hash function on the primary
keys and generates 3, 3, 1, 4 and 2 respectively, and records are stored in those data block addresses.

Types of Hashing:

Static Hashing
In static hashing, the resultant data bucket address will always be the same. That means if we
generate an address for EMP_ID =103 using the hash function mod (5) then it will always result in
same bucket address 3. Here, there will be no change in the bucket address.
Hence in this static hashing, the number of data buckets in memory remains constant throughout.
In this example, we will have five data buckets in the memory used to store the data.

Operations of Static Hashing

o Searching a record

When a record needs to be searched, then the same hash function retrieves the address of the
bucket where the data is stored.

o Insert a Record

When a new record is inserted into the table, then we will generate an address for a new record
based on the hash key and record is stored in that location.

o Delete a Record

To delete a record, we will first fetch the record which is supposed to be deleted. Then we will delete
the records for that address in memory.

o Update a Record

To update a record, we will first search it using a hash function, and then the data record is updated.

If we want to insert some new record into the file but the address of a data bucket generated by the
hash function is not empty, or data already exists in that address. This situation in the static hashing
is known as bucket overflow. This is a critical situation in this method.

To overcome this situation, there are various methods. Some commonly used methods are as
follows:

1. Open Hashing
When a hash function generates an address at which data is already stored, then the next bucket
will be allocated to it. This mechanism is called as Linear Probing.
For example: suppose R3 is a new address which needs to be inserted, the hash function generates
address as 112 for R3. But the generated address is already full. So the system searches next available
data bucket, 113 and assigns R3 to it.

2. Close Hashing
When buckets are full, then a new data bucket is allocated for the same hash result and is linked
after the previous one. This mechanism is known as Overflow chaining.

For example: Suppose R3 is a new address which needs to be inserted into the table, the hash
function generates address as 110 for it. But this bucket is full to store the new data. In this case, a
new bucket is inserted at the end of 110 buckets and is linked to it.

Dynamic Hashing
o The dynamic hashing method is used to overcome the problems of static hashing like bucket
overflow.
o In this method, data buckets grow or shrink as the records increases or decreases. This
method is also known as Extendable hashing method.
o This method makes hashing dynamic, i.e., it allows insertion or deletion without resulting in
poor performance.

How to search a key

o First, calculate the hash address of the key.
o Check how many bits are used in the directory, and these bits are called as i.
o Take the least significant i bits of the hash address. This gives an index of the directory.
o Now using the index, go to the directory and find bucket address where the record might be.

How to insert a new record

o Firstly, you have to follow the same procedure for retrieval, ending up in some bucket.
o If there is still space in that bucket, then place the record in it.
o If the bucket is full, then we will split the bucket and redistribute the records

For example:
Consider the following grouping of keys into buckets, depending on the prefix of their hash address:

The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5 and 6 are 01, so it will
go into bucket B1. The last two bits of 1 and 3 are 10, so it will go into bucket B2. The last two bits of 7 are
11, so it will go into B3.

Insert key 9 with hash address 10001 into the above

structure:
o Since key 9 has hash address 10001, it must go into the first bucket. But bucket B1 is full, so it will get
split.
o The splitting will separate 5, 9 from 6 since last three bits of 5, 9 are 001, so it will go into bucket B1,
and the last three bits of 6 are 101, so it will go into bucket B5.
o Keys 2 and 4 are still in B0. The record in B0 pointed by the 000 and 100 entry because last two bits
of both the entry are 00.
o Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and 110 entry because last two bits
of both the entry are 10.
o Key 7 are still in B3. The record in B3 pointed by the 111 and 011 entry because last two bits of both
the entry are 11.

Advantages of dynamic hashing

o In this method, the performance does not decrease as the data grows in the system. It simply
increases the size of memory to accommodate the data.
o In this method, memory is well utilized as it grows and shrinks with the data. There will not
be any unused memory lying.
o This method is good for the dynamic database where data grows and shrinks frequently.

Disadvantages of dynamic hashing

o In this method, if the data size increases then the bucket size is also increased. These
addresses of data will be maintained in the bucket address table. This is because the data
address will keep changing as buckets grow and shrink. If there is a huge increase in data,
maintaining the bucket address table becomes tedious.
o In this case, the bucket overflow situation will also occur. But it might take little time to reach
this situation than static hashing.

DAA All 5 Units Notes
No ratings yet
DAA All 5 Units Notes
87 pages
Data Structures by D Samantha PDF
No ratings yet
Data Structures by D Samantha PDF
167 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
12 pages
Operating System Notes 2 - TutorialsDuniya
No ratings yet
Operating System Notes 2 - TutorialsDuniya
120 pages
PDF of Data-Structure
No ratings yet
PDF of Data-Structure
178 pages
Theory Automata and Formal Theory Automata and Formal Language Quantum
No ratings yet
Theory Automata and Formal Theory Automata and Formal Language Quantum
79 pages
Co Po Mapping Justification DSA
No ratings yet
Co Po Mapping Justification DSA
3 pages
DBMS - Unit 3 - Notes (Relational Algebra)
No ratings yet
DBMS - Unit 3 - Notes (Relational Algebra)
45 pages
Assignment - C+++ - No - 01 To 16
No ratings yet
Assignment - C+++ - No - 01 To 16
16 pages
python-notes-BCC-302 (Unit - 05)
No ratings yet
python-notes-BCC-302 (Unit - 05)
25 pages
DATA STRUCTURES ANALYSIS OF ALGORITHMS (21-22)
No ratings yet
DATA STRUCTURES ANALYSIS OF ALGORITHMS (21-22)
2 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
ADA Question Bank.docx
No ratings yet
ADA Question Bank.docx
8 pages
DATA STRUCTURES AND ALGORITHMS - Unit 5
No ratings yet
DATA STRUCTURES AND ALGORITHMS - Unit 5
35 pages
MAD Lab Manual
No ratings yet
MAD Lab Manual
43 pages
Divide and Conquer
No ratings yet
Divide and Conquer
54 pages
Introduction to Cyber Security
No ratings yet
Introduction to Cyber Security
150 pages
DAA Question Bank
No ratings yet
DAA Question Bank
39 pages
Dbms Practical File
No ratings yet
Dbms Practical File
22 pages
DSA-Unit 5
No ratings yet
DSA-Unit 5
39 pages
FSC
No ratings yet
FSC
152 pages
Advanced Computer Architecture: Program Flow Mechanisms
No ratings yet
Advanced Computer Architecture: Program Flow Mechanisms
14 pages
Accelerated Windows Malware Analysis With Memory Dumps (PDFDrive)
No ratings yet
Accelerated Windows Malware Analysis With Memory Dumps (PDFDrive)
235 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
3134201-Data Structures and Algorithms
No ratings yet
3134201-Data Structures and Algorithms
3 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
84 pages
B32-RDBMS Assignment Question
No ratings yet
B32-RDBMS Assignment Question
4 pages
DAA NOTES Final
No ratings yet
DAA NOTES Final
9 pages
Applications of Data Structures
No ratings yet
Applications of Data Structures
6 pages
N61B Amd
No ratings yet
N61B Amd
45 pages
MCA Mathematical Foundation For Computer Application 02
No ratings yet
MCA Mathematical Foundation For Computer Application 02
12 pages
TAFL Unitwise - QuestionBank
No ratings yet
TAFL Unitwise - QuestionBank
8 pages
Pandas Course Slides
No ratings yet
Pandas Course Slides
90 pages
DAA Question Bank
No ratings yet
DAA Question Bank
3 pages
Document 45 (5)
No ratings yet
Document 45 (5)
7 pages
Objective Type Questions
75% (4)
Objective Type Questions
115 pages
Database Management System Kcs 501 1
No ratings yet
Database Management System Kcs 501 1
2 pages
MCQ - LPS Linux Questions
No ratings yet
MCQ - LPS Linux Questions
25 pages
Daa Unit Viii Notes 1
No ratings yet
Daa Unit Viii Notes 1
7 pages
DAA Syllabus
No ratings yet
DAA Syllabus
1 page
Product Key Pentru Microsoft Office 2007
77% (35)
Product Key Pentru Microsoft Office 2007
1 page
ABES Institute of Technology Ghaziabad: Lab Manual
No ratings yet
ABES Institute of Technology Ghaziabad: Lab Manual
23 pages
B.TECH. CSE (IoT) Syllabus 3rd Year 2024-25
No ratings yet
B.TECH. CSE (IoT) Syllabus 3rd Year 2024-25
29 pages
DAA Unit Wise Importtant Questions
100% (4)
DAA Unit Wise Importtant Questions
2 pages
Direct Memory Access (DMA)
No ratings yet
Direct Memory Access (DMA)
8 pages
Lecture 3 - Min Heap and Max Heap
No ratings yet
Lecture 3 - Min Heap and Max Heap
28 pages
Organizational Design & Structures (12.04.2024)
No ratings yet
Organizational Design & Structures (12.04.2024)
4 pages
Spatial & Web Mining
No ratings yet
Spatial & Web Mining
45 pages
Algo PPT Unit-2 B Tree
No ratings yet
Algo PPT Unit-2 B Tree
38 pages
UNIT-I DS Notes PDF
No ratings yet
UNIT-I DS Notes PDF
70 pages
Document 45 (2)
No ratings yet
Document 45 (2)
4 pages
Module 4 Nosql
No ratings yet
Module 4 Nosql
8 pages
The 8051 Assembly Language
No ratings yet
The 8051 Assembly Language
89 pages
BD Problem Solving - I
No ratings yet
BD Problem Solving - I
2 pages
Protection Mechanism (Protection domain+ACL)
No ratings yet
Protection Mechanism (Protection domain+ACL)
2 pages
Production Systems
No ratings yet
Production Systems
27 pages
11 - MCA-II, Computer Science
No ratings yet
11 - MCA-II, Computer Science
57 pages
7.5 Arrays, Records, Pointers
No ratings yet
7.5 Arrays, Records, Pointers
7 pages
Cse-V-Formal Languages and Automata Theory (10cs56) - Notes
67% (3)
Cse-V-Formal Languages and Automata Theory (10cs56) - Notes
125 pages
Ovrdbf 1. What Exactly The OVRDBF Does?: 'HSTATUS "C" & HCUST 08177' 'HCUST 08177'
No ratings yet
Ovrdbf 1. What Exactly The OVRDBF Does?: 'HSTATUS "C" & HCUST 08177' 'HCUST 08177'
13 pages
SmartMedia Format
No ratings yet
SmartMedia Format
28 pages
Difference Between Strong and Weak Entity
No ratings yet
Difference Between Strong and Weak Entity
1 page
File Access Methods in Operating System
No ratings yet
File Access Methods in Operating System
4 pages
Ace Tacacs and Radius
No ratings yet
Ace Tacacs and Radius
28 pages
KMP Algorithm
No ratings yet
KMP Algorithm
26 pages
Concurrency Control in DBMS
No ratings yet
Concurrency Control in DBMS
5 pages
Python Notes 3rd Mca
No ratings yet
Python Notes 3rd Mca
99 pages
File Handling
No ratings yet
File Handling
13 pages
SQL Server 2000 and AccuMark
No ratings yet
SQL Server 2000 and AccuMark
18 pages
Unit 3 - Data Structure - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Data Structure - WWW - Rgpvnotes.in
18 pages
Csc3205-Symbol-Table
100% (1)
Csc3205-Symbol-Table
13 pages
Unix Lab Manual MCA1
No ratings yet
Unix Lab Manual MCA1
6 pages
Assignment 2 Front Sheet: Qualification BTEC Level 5 HND Diploma in Computing
No ratings yet
Assignment 2 Front Sheet: Qualification BTEC Level 5 HND Diploma in Computing
23 pages
Distributed Hash Table
No ratings yet
Distributed Hash Table
9 pages
Important Short Questions of DSA
No ratings yet
Important Short Questions of DSA
17 pages
Filesystem On VMware Red Hat Enterprise Linux 4, 5, 6, & 7 Guests Went Read-Only PDF
No ratings yet
Filesystem On VMware Red Hat Enterprise Linux 4, 5, 6, & 7 Guests Went Read-Only PDF
7 pages
Dbms Unit II
No ratings yet
Dbms Unit II
49 pages
Indonesia: Geopostcodes
No ratings yet
Indonesia: Geopostcodes
9 pages
Daa Lab Manual
No ratings yet
Daa Lab Manual
60 pages
02 Data Mining-Partitioning Method
No ratings yet
02 Data Mining-Partitioning Method
8 pages
Virtual Networking Concepts
100% (4)
Virtual Networking Concepts
12 pages
Relational Algebra
No ratings yet
Relational Algebra
13 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Untitled
No ratings yet
Untitled
8 pages
Sample Paper 1
No ratings yet
Sample Paper 1
21 pages
The Role of Algorithms in Computing
No ratings yet
The Role of Algorithms in Computing
9 pages
OOP - I GTU Study Material Presentations Unit-1 07022022102854PM
No ratings yet
OOP - I GTU Study Material Presentations Unit-1 07022022102854PM
59 pages
An Elnec Nand Partitions
No ratings yet
An Elnec Nand Partitions
16 pages
12C Data Guard Switchover Best Practices Using Sqlplus
No ratings yet
12C Data Guard Switchover Best Practices Using Sqlplus
8 pages
Iaps 1003 - Practice Note
No ratings yet
Iaps 1003 - Practice Note
4 pages
Singly Linked List
No ratings yet
Singly Linked List
3 pages
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
No ratings yet
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
26 pages
Fundamentals of Algorithmic Problem Solving: B.B. Karki, LSU 2.1 CSC 3102
No ratings yet
Fundamentals of Algorithmic Problem Solving: B.B. Karki, LSU 2.1 CSC 3102
4 pages
F7634 Gps+evdo Wifi Router Specification
No ratings yet
F7634 Gps+evdo Wifi Router Specification
4 pages
Numerical Based On Indexing: Problem 1.2
No ratings yet
Numerical Based On Indexing: Problem 1.2
3 pages
Database Outlook
No ratings yet
Database Outlook
2 pages
Course File
No ratings yet
Course File
6 pages

Unit 3 Storage Strategies Indices B-Trees Hashing

Uploaded by

Unit 3 Storage Strategies Indices B-Trees Hashing

Uploaded by

Unit 3: Storage strategies: Indices, B-trees, hashing.

 Indexing is used to optimize the performance of a database by minimizing

Indexes can be created using some database columns.

Searching a record in B+ Tree

Operations of Static Hashing

How to search a key

How to insert a new record

Insert key 9 with hash address 10001 into the above

Advantages of dynamic hashing

Disadvantages of dynamic hashing

You might also like