Chapter 7 - Indexing
Chapter 7 - Indexing
Indexing
COMP3278 Introduction to
Database Management Systems
2
We are going to learn…
Basic concepts
B+ -tree
Hashing
3
Section 1
Basic
Concepts
Search key
An attribute or a set of attributes used
to look up records in a file.
Indices are typically much smaller
than the original file.
5
5
Primary v.s. secondary
Primary index - An index whose search key also
defines the sequential order of the file.
E.g., Access staff records through staffID
(primary search key).
However, the data file can be sorted in one order only.
staffID roomID faculty
How about accessing data with 10101 49 C.S.
12121 42 Finance
a different search key? 15151
22222
35
10
Music
Physics
32343 15 History
E.g., Access staff records through 33456
45565
18
20
C.S.
E.E.E.
roomID (a secondary search key, 58583
76543
3
31
Biology
Finance
need a secondary index!). 76766
83821
5
2
Finance
C.S.
98345 24 C.S.
6
2 classes of indices
Ordered Indices – Search keys are sorted in the index
Example: indexed-sequential file, B+-tree.
7
Index evaluation factors
Each indexing technique must be
evaluated on the basis of these factors
Access types – The types of access that are supported
efficiently (e.g., equality search or range search? Single
attribute search or multi-attribute search?)
Access time – The time it takes to find a particular data item,
or a set of items.
No one indexing
Insertion / deletion time technique is the best.
Rather, each technique is
Space overhead best suited to particular
database applications.
8
Section 2
+
B -tree
Slides prepared by - Dr. Chui Chun Kit, for students in COMP3278
For other uses, please email : ckchui@cs.hku.hk
Properties of +
B -tree
B+-tree index structure is one of the most widely used
index structure in DBMS.
Balanced
11
A node in +
B -tree
n=4 pointers
n-1 = 3 search-keys
1 2 5 1 5 2
Non-leaf node
Leaf nodes …
15
2. Non-leaf node
…
Non-leaf node 4
Leaf nodes 1 2 3 4 7 8 …
10101 49 C.S.
In the file, records are ordered 12121 42 Finance
15151 35 Music
according to the 1st attribute, 22222 10 Physics
32343 15 History
we would like to build a B+-tree 33456 18 C.S.
45565 20 E.E.E.
index (secondary index) to 58583 3 Biology
76543 31 Finance
speed up the searching on the 76766 5 Finance
83821 2 C.S.
2nd attribute. 98345 24 C.S.
17
Example +
B -tree
31
10 18 42
2 3 5 10 15 18 20 24 31 35 42 49
10 18 42
2 3 5 10 15 18 20 24 31 35 42 49
10101 49 C.S.
12121 42 Finance
15151 35 Music
22222 10 Physics
32343 15 History
33456 18 C.S.
45565 20 E.E.E.
58583 3 Biology
76543 31 Finance
76766 5 Finance
83821 2 C.S.
98345 24 C.S.
19
Searching
Step 1. Traverse 31
from root to
leaf.
10 18 42
2 3 5 10 15 18 20 24 31 35 42 49
Step 2. Search in the leaf node. Step 3. Follow the 10101 49 C.S.
12121 42 Finance
pointer in the leaf 15151 35 Music
Point query node to retrieve 22222 10 Physics
32343 15 History
SELECT * FROM R WHERE R.B = 3 the record. 33456 18 C.S.
45565 20 E.E.E.
58583 3 Biology
76543 31 Finance
76766 5 Finance
With this B+-tree, how many disk 83821 2 C.S.
98345 24 C.S.
block accesses to answer this query? 20
Searching
31
10 18 42
2 3 5 10 15 18 20 24 31 35 42 49
31
10 18 42
2 3 5 10 15 18 20 24 31 35 42 49
31
10 18 42
2 3 5 10 15 18 20 24 31 35 42 49
We first search for the leaf node that the key “1”
should be inserted.
31
10 18 42
2 3 5 10 15 18 20 24 31 35 42 49
31
10 18 42
1 2 10 15 18 20 24 31 35 42 49
3 5
31
33 10 18 42
1 2 10 15 18 20 24 31 35 42 49
3 5
31
3 10 18 42
10 15 18 20 24 31 35 42 49
1 2 3 5
31
3 10 18 42
10 15 18 20 24 31 35 42 49
1 2 3 5
31
3 10 18 42
10 15 31 35 42 49
1 2 3 5
18 20 24 26
Step 1. Create one more node
and distribute the entries. 30
2. Node splitting (non-leaf node)
Step 2. Update the parent (Parent node is full!)
As we cannot have 5 pointers stored in a non-leaf node, we
need to split this non-leaf node (Recursively).
31
3 10 18 42
10 15 ? 31 35 42 49
1 2 3 5
18 20 24 26
31
2. Node splitting (non-leaf node)
Splitting non-leaf node (Recursive)
Step 1. We first create a new node to accommodate the
new pointers (the 5 pointers, one for each leaf node).
31
3 10 18 42
10 15 ? 31 35 42 49
1 2 3 5
18 20 24 26
32
2. Node splitting (non-leaf node)
Splitting non-leaf node
Step 2. We distribute the pointers among the two
nodes.
31
3 10 18 42
10 15 31 35 42 49
1 2 3 5
18 20 24 26
33
2. Node splitting (non-leaf node)
Tricky part! Note that the
Splitting non-leaf node
search key that lies between
Step 3. Then consider the keys that the pointers that stay on the
are required in each slot among the left, and the pointers that
two nodes. move to the right node (i.e.
31
18) is treated differently.
3 10 18 24 42
10 15 31 35 42 49
1 2 3 5
18 20 24 26
34
2. Node splitting (non-leaf node)
“18” is moved to the parent node to separate the search-
keys among the two nodes (if the parent node is full, split
the parent node recursively)
18 31
3 10 24 42
10 15 31 35 42 49
1 2 3 5
18 20 24 26
35
Deletion
Find the record to be deleted.
Remove it from the file and from the leaf node (if
present)
If the leaf node has too few entries due to the removal:
36
1. Merging
31
3 10 18 42
10 15 18 20 24 31 35 42 49
1 2 3 5
31
3 10 18 42
10 15 18 20 24 31 35 49
1 2 3 5
Deletion may cause a node to underfull.
This node has only 1 value, which violates the
requirement that each leaf node must contain
at least (n – 1)/2 values (i.e., 2 in this case).
38
1. Merging
31
3 10 18 42
10 15 18 20 24 31 35 49
1 2 3 5
39
1. Merging
After merging, this leaf node
is empty and no longer used.
31
3 10 18 42
10 15 18 20 24 31 35 49
1 2 3 5
40
1. Merging
Step 2. Update the parents.
31
3 10 18
10 15 18 20 24 31 35 49
1 2 3 5
The parent node now contains too
few pointers. Remember we require
non-leaf node to have at least n/2
pointers.
41
1. Merging
Recursively, we try to MERGE these 2 nodes.
However, the two nodes cannot be merged as the left node is
already full (4 pointers).
31
3 10 18
10 15 18 20 24 31 35 49
1 2 3 5
31
3 10 18
10 15 18 20 24 31 35 49
1 2 3 5
43
2. Redistribution
Redistribution
Step2. Update the keys.
18
3 10 31
10 15 18 20 24 31 35 49
1 2 3 5
44
Example 1
Delete 35
18
3 10 31
10 15 18 20 24 31 35 49
1 2 3 5
45
Example 1
After deletion, this node contains 2 values (VALID).
Delete 35 Remember the keys in a node should be in sorted
order.
18
3 10 31
10 15 18 20 24 31 49
1 2 3 5
46
Example 2
Deletion of “49” causes this leaf node to contain only
one value, which is underfull.
Delete 49 First, try MERGE with its sibling node, but the sibling
node is full, so we need to do REDISTRIBUTION.
18
3 10 31
10 15 18 20 24 31 49
1 2 3 5
47
Example 2
After REDISTRIBUTION, we need
Delete 49 to update the keys.
18
3 10 31
10 15 18 20 24 31
1 2 3 5
48
Example 2
Delete 49
18
3 10 24
10 15 18 20 24 31
1 2 3 5
49
Example 3
Deletion of “18” causes this leaf node to contain only
one value, which is underfull.
Delete 18 First, try merge with its sibling node, which sibling
should be merged?
18
3 10 24
10 15 18 20 24 31
1 2 3 5
50
Example 3
After merging, this leaf node
Delete 18 is empty and no longer used.
18
3 10 24
10 15 20 24 31
1 2 3 5
51
Example 3
Now this node has only one pointer,
which is underfull (1 pointer only).
Delete 18
We try merging it with its sibling.
18
3 10
10 15 20 24 31
1 2 3 5
52
Example 3
Merging non-leaf nodes
Delete 18 Step 1. Update the pointers.
18
3 10
10 15 20 24 31
1 2 3 5
53
Example 3
Merging non-leaf nodes
Step 2. Update the keys.
Delete 18 (It is “18” as originally it is the key “18” in the
root node that separate the two pointers.)
18
3 10 18
10 15 20 24 31
1 2 3 5
54
Example 3
Note that since we merged the non-
Delete 18 leaf node, some pointers and parent
entries can be removed.
18
3 10 18
10 15 20 24 31
1 2 3 5
55
Example 3
Delete 18
3 10 18
10 15 20 24 31
1 2 3 5
56
Section 3
Hashing
59
Static hashing
Settings
Hash function : K mod 4.
Each bucket can hold 2 entries.
Bucket 0
4
Search key
Bucket 1
49
10101 49 C.S.
5 12121 4 Finance
15151 5 Music
Bucket 2 9 22222 9 Physics
60
Static hashing
Problems
Database grows with time. If the initial number of buckets
is too small, performance will degrade due to too much
overflow.
Solution
Dynamic hashing (e.g., extendable hashing)
61
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011
Physics 1001 1000 0011 1111 1001 1100 0000 0001
10101 … C.S.
12121 … Finance
15151 … Music
22222 … Physics
32343 … History
33456 … Physics
45565 … C.S.
58583 … History
76543 … Finance
76766 … Biology
83821 … C.S.
98345 … E.E. 62
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000 Assumption
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 Each bucket can store 2
Finance 1010 0011 1010 0000 1100 0110 1001 1111 entries.
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 0 bit(s) in their hash prefix.
Bucket 1
10101 … C.S.
12121 … Finance Bucket address table
15151 … Music
22222 … Physics
32343 … History
33456 … Physics
45565 … C.S.
58583 … History
76543 … Finance
76766 … Biology
83821 … C.S.
98345 … E.E. 63
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000 Assumption
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 Each bucket can store 2
Finance 1010 0011 1010 0000 1100 0110 1001 1111 entries.
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 0 bit(s) in their hash prefix.
Insert Bucket 1
10101 … C.S.
12121 … Finance Bucket address table
15151 … Music
22222 … Physics
32343
33456
…
…
History
Physics
Since currently hash prefix = 0, which
45565
58583
…
…
C.S.
History
means the first 0 bits are used in hashing
76543
76766
…
…
Finance
Biology
(no bits), so simply store the record in the
83821
98345
…
…
C.S.
E.E.
only bucket. 64
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000 Assumption
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 Each bucket can store 2
Finance 1010 0011 1010 0000 1100 0110 1001 1111 entries.
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 0 bit(s) in their hash prefix.
Bucket 1
10101 … C.S. Insert
12121 … Finance Bucket address table
15151 … Music
22222 … Physics
32343 … History
33456 … Physics
45565 … C.S.
58583 … History
76543 … Finance
76766 … Biology
83821 … C.S.
98345 … E.E. 65
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 0 bit(s) in their hash prefix.
000 3
10101 … C.S. 001
12121 … Finance . 22222 … Physics
15151 … Music 010
22222 … Physics
32343 … History Insert
011 3
33456 … Physics 100
45565 … C.S. . 12121 … Finance
58583 … History 101
76543 … Finance
76766 … Biology 110 2
83821 … C.S. 111
98345 … E.E. . 10101
32343
…
…
C.S.
History
74
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 1 bit(s) in their hash prefix.
1
Hash prefix : The first 3
bits are used in hashing. 3 . 15151 … Music
000 3
10101 … C.S. 001
12121 … Finance . 22222
33456
…
…
Physics
Physics
15151 … Music 010
22222 … Physics
32343 … History Insert
011 3
33456 … Physics 100
45565 … C.S. . 12121 … Finance
58583 … History 101
76543 … Finance
76766 … Biology 110 2
83821 … C.S. 111
98345 … E.E. . 10101
32343
…
…
C.S.
History
75
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
Finance 1010 0011 1010 0000 1100 0110 1001 1111
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 All records in this bucket share the
Physics 1001 1000 0011 1111 1001 1100 0000 0001
same 1 bit(s) in their hash prefix.
1
Hash prefix : The first 3
bits are used in hashing. 3 . 15151 … Music
000 3
10101
12121
…
…
C.S.
Finance
001 . 22222 … Physics Since the hash
33456 … Physics
15151 … Music 010 prefix is 3 and 2
22222 … Physics
32343 … History 011 3
is 2, we can
33456
45565
…
…
Physics
C.S.
Insert 100 . 12121 … Finance simply add a
58583 … History 101 bucket without
76543 … Finance
110 increasing the
76766
83821
…
…
Biology
C.S. 111
2 FULL hash prefix.
98345 … E.E. . 10101
…
32343 …
C.S.
History
Bucket address table 76
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 .
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456
…
…
Physics
Physics
000
3
10101
12121
…
…
C.S.
Finance
001 12121 … Finance
Since the hash
.
15151 … Music 010 prefix is 3 and 2
22222 … Physics
32343 … History 011 3
is 2, we can
33456
45565
…
…
Physics
C.S.
Insert 100 . 32343 … History simply add a
58583 … History 101 bucket without
76543 … Finance
76766 … Biology 110 3
increasing the
83821
98345
…
…
C.S.
E.E.
111 . 10101 … C.S. hash prefix.
Bucket address table 77
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 .
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456
…
…
Physics
Physics
000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010
22222 … Physics
32343 … History 011 3
33456 … Physics Insert 100 32343 … History
45565 … C.S. .
58583 … History 101
76543 … Finance
76766 … Biology 110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565
…
…
C.S.
C.S.
78
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 .
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456
…
…
Physics
Physics
000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010
22222 … Physics
32343 … History 011 3
33456 … Physics 100 32343 … History
45565 … C.S. Insert . 58583 … History
58583 … History 101
76543 … Finance
76766 … Biology 110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565
…
…
C.S.
C.S.
79
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 .
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456
…
…
Physics
Physics
000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010 76543 … Finance
22222 … Physics
32343 … History 011 3
33456 … Physics 100 32343 … History
45565 … C.S. . 58583 … History
58583 … History Insert 101
76543 … Finance
76766 … Biology 110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565
…
…
C.S.
C.S.
80
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 . 76766 … Biology
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456
…
…
Physics
Physics
000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010 76543 … Finance
22222 … Physics
32343 … History 011 3
33456 … Physics 100 32343 … History
45565 … C.S. . 58583 … History
58583 … History 101
76543 … Finance Insert
76766 … Biology 110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565
…
…
C.S.
C.S.
81
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 . 76766 … Biology
Note that all
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3 entries in this
Hash prefix : The first 3 bucket are of the
bits are used in hashing. 3 . 22222
33456
…
…
Physics
Physics
same hash value
000
10101 … C.S. 3 “C.S.”, increasing
001
12121 … Finance . 12121 … Finance hash prefix will
15151 … Music 010 76543 … Finance
22222 … Physics not solve the
32343 … History 011
33456 … Physics
3 collision
100 . 32343 … History
45565 … C.S. 58583 … History problem!
58583 … History 101
76543 … Finance Chaining is used.
110
76766
83821
…
…
Biology
C.S.
Insert
111
3 FULL
10101 … C.S.
98345 … E.E. . 45565 82
… C.S.
Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 . 76766 … Biology
Note that all
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3 entries in this
Hash prefix : The first 3 bucket are of the
bits are used in hashing. 3 . 22222
33456
…
…
Physics
Physics
same hash value
000
10101 … C.S. 3 “C.S.”, increasing
001
12121 … Finance . 12121 … Finance hash prefix will
15151 … Music 010 76543 … Finance
22222 … Physics not solve the
32343 … History 011
33456 … Physics
3 collision
100 . 32343 … History
45565 … C.S. 58583 … History problem!
58583 … History 101
76543 … Finance Chaining is used.
76766 … Biology Insert
110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565
…
…
C.S.
C.S.
83821 … C.S.
83 Bucket address table
Extendable hashing
Department h(department) – binary representation
Biology 0010 1101 1111 1011 0010 1100 0011 0000
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
E.E. 0100 0011 1010 1100 1100 0110 1101 1111 All records in this bucket share the
same 1 bit(s) in their hash prefix.
Finance 1010 0011 1010 0000 1100 0110 1001 1111
1
History 1100 0111 1110 1101 1011 1111 0011 1010
15151 … Music
Music 0011 0101 1010 0110 1100 1001 1110 1011 . 76766 … Biology FULL
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456
…
…
Physics
Physics
000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010 76543 … Finance
22222 … Physics
32343 … History 011 3
33456 … Physics 100 32343 … History
45565 … C.S. . 58583 … History
58583 … History 101
76543 … Finance
76766 … Biology 110 3
83821 … C.S. Insert 111
98345 … E.E. . 10101
45565
…
…
C.S.
C.S.
83821 … C.S.
84 Bucket address table
Extendable hashing
Department h(department) – binary representation All records in this bucket share the
same 2 bit(s) in their hash prefix.
Biology 0010 1101 1111 1011 0010 1100 0011 0000
2
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
15151 … Music
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
. 76766 … Biology
Finance 1010 0011 1010 0000 1100 0110 1001 1111
2
History 1100 0111 1110 1101 1011 1111 0011 1010
Music 0011 0101 1010 0110 1100 1001 1110 1011 .
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456
…
…
Physics
Physics
000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010 76543 … Finance
22222 … Physics
32343 … History 011 3
33456 … Physics 100 32343 … History
45565 … C.S. . 58583 … History
58583 … History 101
76543 … Finance
76766 … Biology 110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565
…
…
C.S.
C.S.
83821 … C.S.
85 Bucket address table
Extendable hashing
Department h(department) – binary representation All records in this bucket share the
same 2 bit(s) in their hash prefix.
Biology 0010 1101 1111 1011 0010 1100 0011 0000
2
C.S. 1111 0001 0010 0100 1001 0011 0110 1101
15151 … Music
E.E. 0100 0011 1010 1100 1100 0110 1101 1111
. 76766 … Biology
Finance 1010 0011 1010 0000 1100 0110 1001 1111
2
History 1100 0111 1110 1101 1011 1111 0011 1010
98345 … E.E.
Music 0011 0101 1010 0110 1100 1001 1110 1011 .
Physics 1001 1000 0011 1111 1001 1100 0000 0001
3
Hash prefix : The first 3
bits are used in hashing. 3 . 22222
33456
…
…
Physics
Physics
000
10101 … C.S. 3
001
12121 … Finance . 12121 … Finance
15151 … Music 010 76543 … Finance
22222 … Physics
32343 … History 011 3
33456 … Physics 100 32343 … History
45565 … C.S. . 58583 … History
58583 … History 101
76543 … Finance
76766 … Biology 110 3
83821 … C.S. 111
98345 … E.E. . 10101
45565
…
…
C.S.
C.S.
83821 … C.S.
86 Bucket address table
Extendable hashing
The bucket address table has a hash prefix value n.
The first-n bits of the hash values are used to locate
the address table entry for the pointer pointing to the
bucket containing records with that hash value.
3
Hash prefix : The first n bits are used in hashing.
000
001
010
2
011
100 00
1
101 01
0
110 10 0
111 11 1
87
Extendable hashing
All records in this bucket share the
same 2 bit(s) in their hash prefix.
Each bucket j contains:
2
The records
An integer ij
88
Splitting a bucket
If the hash prefix = ij
Only 1 pointer in address table is pointing to bucket j.
1 .
0 0
10101 … C.S. 0
12121 … Finance
1 1
Bucket 1 10101 … C.S.
Bucket address table . 12121 … Finance
Bucket address table
89
Splitting a bucket
If the hash prefix > ij (i.e., the indicator in the
bucket address table is larger than the indicator
in bucket)
Create a new bucket z, and set ij and iz to old ij + 1.
There must be a number of address table entries pointing
to bucket j originally, change the lower half of such
entries so that they point to bucket z.
They are not used for comparison operators such as < that
find a range of values.
91
Section 4
Index &
SQL
END
COMP3278 Introduction to
Database Management Systems