0% found this document useful (0 votes)

55 views15 pages

Chap. 6 Hash-Based Indexing: Abel J.P. Gomes

Extendible hashing avoids overflow pages by splitting buckets when they become full and inserting new data entries. It uses a directory of pointers to buckets, doubling the directory size when needed to split an overflowed bucket. This allows splitting to be done with a single data page rewrite instead of reorganizing the whole file. The hash function is adjusted when splitting to map keys to the new bucket structure. Linear hashing is similar but handles deletions differently by allowing buckets to merge. Both techniques are dynamic hashing methods that adapt the hash table as the dataset changes over time.

Uploaded by

Anilbabu Amalapurapu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views15 pages

Chap. 6 Hash-Based Indexing: Abel J.P. Gomes

Uploaded by

Anilbabu Amalapurapu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Chap.

6 Hash-Based Indexing
Not cahos-like, together crushed and bruised, But, as the world harmoniously confused: Where order in variety we see. -- Alexander Pope

Abel J.P. Gomes

Bibliography: 1. R. Ramakrishnan and J. Gehrke. Database Management Systems. Addison-Wesley, 2003 (cap.11).

1. Objectives

What is the intuition behind hash-structured indexes? Why are they especially good for equality searches but useless for range selections? What is Extendible Hashing? How does it handle search, insert, and delete? What is Linear Hashing? How does it handle search, insert, and delete? What are the similarities and differences between Extendible and Linear Hashing?

2. Introduction

The basic idea is to use a hashing function, which maps a search-key value (of a field) into a record or bucket of records. As for any index, 3 alternatives for data entries k*:
Actual data record (with key value k) <k, rid of matching data record> <k, list of rids of matching data records>

Hash-based indexes are best for equality selections. They cannot support range searches. Static and dynamic hashing techniques exist; trade-offs
similar to ISAM vs. B+ trees.
3

3. Static Hashing

A bucket is a unit of storage containing one or more records (a bucket is typically a disk block). In a hash file organization we obtain the bucket of a record directly from its search-key value using a hash function. Hash function h is a function from the set of all search-key values K to the set of all bucket addresses B. Hash function is used to locate records for access, insertion as well as deletion. Records with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate a record.
4

3. Static Hashing (cont.1)

# primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed. h(key) mod B = bucket to which data entry with key key belongs. (B = #of buckets)
0 2 key ... ... ...

N-1
primary bucket pages

...
overflow pages
5

3. Static Hashing (cont.2)

Buckets contain data entries. Hash function works on search key field of record r. Must distribute values over range 0,...,B-1.

h(key)=(a*key+b) usually works well. a and b are constants; lots known about how to tune h.

Long overflow chains can develop and degrade performance. Extendible and Linear Hashing: dynamic techniques to fix this problem.

4. Example to illustrate inserts and overflows

Let us assume we have 2 records/bucket and 4 buckets numbered from 0 to 3.. Inserting the data entry e leads to creation of an overflow bucket page, as illustrated below: INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 h(e) = 1 0 1 2 3

d a c b e

4. Example to illustrate deletes

Deleting the data entry c after e and f leads to elimination of overflow bucket containing the data entry d; d is then moved into the primary bucket 1. 0 1 2 3

DELETE: e f c

a b c d e f g
maybe move g up
8

5. Rule of Thumb

Try to keep space utilization between 50% and 80%! Utilization = # keys used total # keys that fit If <50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

6 How do we cope with growth?

Overflows and reorganizations Dynamic hashing:

Extensible Linear

7 Extendible Hashing

Situation: bucket (primary page) becomes full. Why not re-organize file by doubling # of buckets? Reading and writing all pages is expensive! Idea: use directory of pointers to buckets, double # of buckets by doubling the directory, splitting just the bucket that overflowed! Directory much smaller than file, so doubling it is much cheaper. Only one page of data entries is split. No overflow page! Trick lies in how hash function is adjusted!

7.1 Extendible Hashing: Example

Directory is array of size 4. To find bucket for k, take last global depth # of bits of h(k). For example: If h(k)=5=binary 101, it is in bucket pointed to by 01.

LOCAL DEPTH GLOBAL DEPTH

2 4* 12* 32* 16* Bucket A

2 00 01

2 1* 5* 21* 13* Bucket B

10 11

2 10* Bucket C

2 15* 7* 19* DATA PAGES

Bucket D

7.2 Extendible Hashing: Inserting

Insert: If bucket is full, split it (allocate new page, re-distribute). If necessary, double the directory. (As we will see, splitting a bucket does not always require doubling; we can tell by comparing global depth with local depth for the split bucket.) For example: insert h(k)=20 (causes doubling)
2 Bucket A 32*16* 2 1* 5* 21*13* Bucket B 2 10* 2 000 001 010 011 100 Bucket D 101 110 111 Bucket A2 (`split image' of Bucket A) DIRECTORY LOCAL DEPTH GLOBAL DEPTH 3 2 1* 5* 21*13* Bucket B 2 10* 2 15* 7* 19* 3 4* 12* 20* Bucket C 3 32*16* Bucket A

LOCAL DEPTH GLOBAL DEPTH 2 00 01 10 11 DIRECTORY

Bucket C

15* 7* 19* 2 4* 12* 20*

Bucket D Bucket A2 (`split image' 13 of Bucket A)

Summary

Hash-based indexes: best for equality searches, cannot support range searches. Static Hashing can lead to long overflow chains. Extendible Hashing avoids overflow pages by splitting a full bucket when a new data entry is to be added to it. (Duplicates may require overflow pages.)

Directory to keep track of buckets, doubles periodically. Can get large with skewed data; additional I/O if this does not fit in main memory.

END OF CHAPTER

Summary (cont.)

Key compression increases fanout, reduces height. Bulk loading can be much faster than repeated inserts for creating a B+ tree on a large data set. Most widely used index in database management systems because of its versatility. One of the most optimized components of a DBMS.

Static and Dynamic Hashing
No ratings yet
Static and Dynamic Hashing
12 pages
Andhra History
100% (19)
Andhra History
287 pages
Scripting1 (Baps)
100% (1)
Scripting1 (Baps)
42 pages
Fundamentals of Power Query and M - A Detailed Guide
No ratings yet
Fundamentals of Power Query and M - A Detailed Guide
34 pages
Sap Cloud Application Programming Model Capm
No ratings yet
Sap Cloud Application Programming Model Capm
9 pages
Unit-4 Hand Written
No ratings yet
Unit-4 Hand Written
35 pages
Data Handling in I.O.T: R.K.Biradar
100% (1)
Data Handling in I.O.T: R.K.Biradar
16 pages
LDRMS[1]_REPORT_0.1[1]
No ratings yet
LDRMS[1]_REPORT_0.1[1]
21 pages
Linear Hashing
No ratings yet
Linear Hashing
21 pages
(Java) Pure Java 2
No ratings yet
(Java) Pure Java 2
1,051 pages
Geography Bit Bank
0% (1)
Geography Bit Bank
129 pages
UNIT V
No ratings yet
UNIT V
93 pages
Chapter 11
No ratings yet
Chapter 11
22 pages
Chap 12. Extendible Hashing: File Structures
No ratings yet
Chap 12. Extendible Hashing: File Structures
40 pages
How To Implement Probability Distribution With
No ratings yet
How To Implement Probability Distribution With
44 pages
The 7 Most Effective Data Masking Techniques
No ratings yet
The 7 Most Effective Data Masking Techniques
8 pages
MCA 2023 Syllabus - 27-10-2023
No ratings yet
MCA 2023 Syllabus - 27-10-2023
107 pages
11 What Is Hashing in DBMS
No ratings yet
11 What Is Hashing in DBMS
20 pages
Lecture14 Hash Based Indexing and Sorting MHH 18oct 2016
No ratings yet
Lecture14 Hash Based Indexing and Sorting MHH 18oct 2016
71 pages
Extendible Hashing
No ratings yet
Extendible Hashing
65 pages
Fs Mod 5 (WWW - Vtuloop.com)
No ratings yet
Fs Mod 5 (WWW - Vtuloop.com)
105 pages
DATA HANDLING
No ratings yet
DATA HANDLING
12 pages
04_UW_Hashing (3)
No ratings yet
04_UW_Hashing (3)
79 pages
DSAD Dynamic Hashing
No ratings yet
DSAD Dynamic Hashing
79 pages
Unit_6
No ratings yet
Unit_6
38 pages
BCSE302L-Database Systems Module - 4 Part2
No ratings yet
BCSE302L-Database Systems Module - 4 Part2
71 pages
Unit 3.Docx Dbms
No ratings yet
Unit 3.Docx Dbms
25 pages
Hashing
No ratings yet
Hashing
33 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
11 pages
Unit-5 B+Trees & Hashing
No ratings yet
Unit-5 B+Trees & Hashing
37 pages
Adbs 5
No ratings yet
Adbs 5
37 pages
Unit 6.2 Indexing and Hashing
No ratings yet
Unit 6.2 Indexing and Hashing
37 pages
django-safedelete-readthedocs-io-en-latest
No ratings yet
django-safedelete-readthedocs-io-en-latest
27 pages
Unit-3 Part 2 Indexing and Hashing
No ratings yet
Unit-3 Part 2 Indexing and Hashing
36 pages
DBWK 2
No ratings yet
DBWK 2
41 pages
1 - Disk Storage - Ch13
No ratings yet
1 - Disk Storage - Ch13
31 pages
Unit-3 Hashing Storage Btree
No ratings yet
Unit-3 Hashing Storage Btree
26 pages
Review Paper
No ratings yet
Review Paper
24 pages
Ch11 Hash Indexes 1perpage Annotated
No ratings yet
Ch11 Hash Indexes 1perpage Annotated
28 pages
Chapter 7 Indexing Part2
No ratings yet
Chapter 7 Indexing Part2
41 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
Chapter 11
No ratings yet
Chapter 11
22 pages
software 2025
No ratings yet
software 2025
21 pages
Lec04 Hashing CH 11 P2
No ratings yet
Lec04 Hashing CH 11 P2
44 pages
hashing-2 (1)
No ratings yet
hashing-2 (1)
17 pages
Principles of Object Oriented Database Design
No ratings yet
Principles of Object Oriented Database Design
17 pages
CO3 Session 6
No ratings yet
CO3 Session 6
29 pages
Unit 4-Hashing
No ratings yet
Unit 4-Hashing
24 pages
m5 Index PDF
No ratings yet
m5 Index PDF
60 pages
5 Data Storage and Indexing
No ratings yet
5 Data Storage and Indexing
60 pages
Computer Science Practical File XII
No ratings yet
Computer Science Practical File XII
32 pages
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
No ratings yet
Database Systems (資料庫系統) : November 26/28, 2007 Lecture #9
43 pages
Database Indexing and Hashing
No ratings yet
Database Indexing and Hashing
7 pages
Dynamic Hashing
No ratings yet
Dynamic Hashing
35 pages
Group Assignment - On - Hashing in DBMS
No ratings yet
Group Assignment - On - Hashing in DBMS
4 pages
Principles of Database Management Systems: 4.2: Hashing Techniques
No ratings yet
Principles of Database Management Systems: 4.2: Hashing Techniques
36 pages
Unit Iv Implementation Techniques
No ratings yet
Unit Iv Implementation Techniques
91 pages
5 Data Storage and Indexing
No ratings yet
5 Data Storage and Indexing
58 pages
Chapter 10
No ratings yet
Chapter 10
17 pages
Has
No ratings yet
Has
10 pages
ANALYSIS OF LONGITUNIDAL CEPHALOMETRIC GROWTH DATA. Bhatia B.C. Leighton (1993) (18-34)
No ratings yet
ANALYSIS OF LONGITUNIDAL CEPHALOMETRIC GROWTH DATA. Bhatia B.C. Leighton (1993) (18-34)
17 pages
NNTDicom User Guide
No ratings yet
NNTDicom User Guide
17 pages
Web System Technologies: Formative 1
No ratings yet
Web System Technologies: Formative 1
16 pages
EPM 1173 - Day - 2-Unit - 2 - Excel-1
No ratings yet
EPM 1173 - Day - 2-Unit - 2 - Excel-1
16 pages
CO3 Notes Hashing
No ratings yet
CO3 Notes Hashing
10 pages
Hashing
No ratings yet
Hashing
8 pages
Prelim 1 - Paper 2
No ratings yet
Prelim 1 - Paper 2
9 pages
Hash-Based Indexes: As For Any Index, 3 Alternatives For Data Entries K
No ratings yet
Hash-Based Indexes: As For Any Index, 3 Alternatives For Data Entries K
7 pages
CS143: Hash Index
No ratings yet
CS143: Hash Index
26 pages
File Organization and Indexing: Structure of Disks
No ratings yet
File Organization and Indexing: Structure of Disks
28 pages
Hash-Based Indexes: Introduction To Database, Fall 2004/melikyan 1
No ratings yet
Hash-Based Indexes: Introduction To Database, Fall 2004/melikyan 1
19 pages
DBMS Hashing
No ratings yet
DBMS Hashing
3 pages
6 Hash-Based Indexing
No ratings yet
6 Hash-Based Indexing
26 pages
GIS Interoperability: M Sondheim, K Gardels, and K Buehler
No ratings yet
GIS Interoperability: M Sondheim, K Gardels, and K Buehler
12 pages
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
No ratings yet
Unit 3 - DBMS (Indexing, Hashing, B+-Tree)
7 pages
Dynamic Hashing Notes
No ratings yet
Dynamic Hashing Notes
3 pages
hash_dbms
No ratings yet
hash_dbms
5 pages
ACT-15-97 Actifio DevOps White Paper-150605d-Pq
No ratings yet
ACT-15-97 Actifio DevOps White Paper-150605d-Pq
7 pages
Internal Sales Orders - Intransit Lead Times Setup Steps for Available to Promise (ATP), Global Order Promising (GOP), Non-ATP Items and ASCP Planning
No ratings yet
Internal Sales Orders - Intransit Lead Times Setup Steps for Available to Promise (ATP), Global Order Promising (GOP), Non-ATP Items and ASCP Planning
3 pages
Farooq Resume 1J23-2
No ratings yet
Farooq Resume 1J23-2
3 pages
Learn SQL - Queries Cheatsheet - Codecademy
100% (1)
Learn SQL - Queries Cheatsheet - Codecademy
3 pages
Hashing in DBMS
No ratings yet
Hashing in DBMS
6 pages
CP Project Report
No ratings yet
CP Project Report
5 pages
Hashing
No ratings yet
Hashing
8 pages
Unit Iv
No ratings yet
Unit Iv
6 pages
Assignment Database Design Task 1
No ratings yet
Assignment Database Design Task 1
6 pages
Chapter 12: Indexing and Hashing
No ratings yet
Chapter 12: Indexing and Hashing
31 pages
Imp Questions
No ratings yet
Imp Questions
1 page
E Ds Extendiblehashing
No ratings yet
E Ds Extendiblehashing
3 pages
Penetration Testing - Access SQL Injection
No ratings yet
Penetration Testing - Access SQL Injection
3 pages
Bpo Process Diagram
No ratings yet
Bpo Process Diagram
1 page
Hashing in DBMS
No ratings yet
Hashing in DBMS
9 pages
There Are Two Types of Hashing
No ratings yet
There Are Two Types of Hashing
2 pages
1593147754doctor Appointment Python Djando Web Application
No ratings yet
1593147754doctor Appointment Python Djando Web Application
2 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet

Chap. 6 Hash-Based Indexing: Abel J.P. Gomes

Uploaded by

Chap. 6 Hash-Based Indexing: Abel J.P. Gomes

Uploaded by

Chap.

Abel J.P. Gomes

3. Static Hashing (cont.1)

3. Static Hashing (cont.2)

4. Example to illustrate inserts and overflows

4. Example to illustrate deletes

6 How do we cope with growth?

Overflows and reorganizations Dynamic hashing:

7.1 Extendible Hashing: Example

LOCAL DEPTH GLOBAL DEPTH

2 4* 12* 32* 16* Bucket A

2 1* 5* 21* 13* Bucket B

2 15* 7* 19* DATA PAGES

7.2 Extendible Hashing: Inserting

LOCAL DEPTH GLOBAL DEPTH 2 00 01 10 11 DIRECTORY

15* 7* 19* 2 4* 12* 20*

Bucket D Bucket A2 (`split image' 13 of Bucket A)

You might also like