Lecture 23 Hash Code Map
Lecture 23 Hash Code Map
Algorithms
• Text Books:
• T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,
Introduction to Algorithms, MIT Press, 3/e, 2009.
• Y. Langsam, M. J. Augenstein, and A. M. Tenenbaum, Data
Structures Using C and C++, Prentice Hall, 2/e, 2000.
• S. Sahni, Data Structures, Algorithms, and Applications in C++,
Silicon Press, 2/e, 2005.
• E. Horowitz, S. Sahni, S. A. Freed, Fundamentals of Data
Structures in C, Computer Science Press, New York, 2/e, 2008.
1
Analysis of Chained Hashing
• We assume time to compute hash function h(k) is Θ(1)
– Often, we do arithmetic function to compute h(k)
• A good hash function is the one that distributes keys
evenly among all slots
• Assume that we have such a hash function called
simple uniform hashing
• Load factor (α) of table: Ratio of number of elements
(n) in the table to the size of the hash table (m)
n
m
2
Analysis of Chained Hashing
• Unsuccessful search:
– Time spent is proportional to the length of linked list
– Average length of the linked list:
• If the hashing function is uniform hashing function, then we
expect that each linked list is of the size α
– Expected number of elements to be compared is α
– Total expected search time: O(1+ α)
• Successful search:
– Assume hashing function is uniform hashing function
– Number of elements examined for searching an ith
element is 1 more than the number of elements appear
before ith element (i.e i-1)
– The expected length of the list is (i-1)/m
1 n i 1
– Expected number of elements examined is: 1
n i 1 m
– Average time: O(1+ α)
3
Practical Considerations
• Hash function: Typically, an arithmetic operation
h(k) = k mod m
– k: key in integer form
– m: size of the hash table
• Size of the hash table:
– Problem dependent
– Typically, the number of records (elements) in the database
• Issue: When we do not have prior knowledge of the number
records (elements) to be inserted for example Bank Account
– Solution: Start with small hash table
– When the number of elements (records) to be inserted is very
large, the number of element in the linked list becomes large
– Now, move entire set of elements into a larger hash table
4
Characteristics of Good Hash Function
1. A hash function should be computed quickly
2. It should distribute the key uniformly over the hash
table
– It should distribute keys evenly among all slots
3. Hash functions should minimize collisions
4. A good hash function should handle non-integer keys
in efficient way
5. A hash function should map equal keys to same
location (indices)
Birthday Paradox: m=365, n=30 or 60 or 100
There is high probability that at least two people will
have same birth date.
5
Hash Function Idea- Techniques to choose hash
function
• Truncation Method:
• Folding Method
• Modular Method
6
Hash Function Idea- Techniques to choose hash
function
• Truncation Method:
7
Hash Function Idea- Techniques to choose hash
function
Mid square Method
8
Hash Function Idea- Techniques to choose hash
function
Folding Method
9
10
11
Hash Function Idea- Techniques to choose hash
function
•Modular Method
12
Hash Function-idea
• Hash function generally perform two maps:
– Hash-code map
– Converting key to an integer value
Key —› integer
– Compression map
– Mapping an integer of any arbitrary range onto the range of
hash table size (m)
Integer —› [0, 1, 2, …, m-1]
13
Hash-Code Maps
• Converts the key to integer
• Integer cast:
– Any numeric key with 32- or less-bit pattern is interpreted as
an integer
• Component sum:
– If a numeric key is more than 32 bits (real numbers), then it is
divided into chunks of 32 bits and added to get an integer of
32 bit
– This can be extended to string type of keys
• Adding the ASCII coded of each character in a string
• It leads to more collisions
14
Hash-Code Maps
• Converts the key to integer
• Integer using radix-128 representation:
– Express strings in the radix-128 representation
– Consider ASCII code for each character and express them in
radix-128 integer representation
a0 128N-1 + a1 128N-2 + a2 128N-3 + … + aN-3 1282 + aN-2 1281 + aN-1 1280
– Here, N = number of characters in a string
– a0 … aN-1 are the ASCII code for 0th to (N-1)th character respectively
– The integer might be of large range
– The number of collisions will be less
15
Hash-Code Maps
• Polynomial accumulation:
– String is seen as a polynomial expression of the form
a0 + a1 x + a2 x2 + … + aN-1 xN-1
• where a0, a1,…, aN-1, are ASCII code of each character in a string of
length N
– Evaluate the polynomial expression at a certain value of x to
get the integer number
– The obtained integer might be of large range
– Experimental researches have shown that at the value of x =
33 or 37 or 39 or 41 gives at most 6 collisions on a vocabulary
of 50000 English words
16
Compression Maps
• Mapping a given integer onto the small range of hash
table index
• Division method:
– In division method for creating hash functions, we map a
key into one of the m slots by taking the reminder of
integer key k divided by m
h(k) = k mod m
– Values for m : Depends on data
• m should not be a power of 2 (i.e., m 2)p
• When m 2 ,p then h(k) is just p lowest order bits of k
17
18
Compression Maps
• Multiplication method:
– Operates in two steps
• Multiply the key k by a constant A in the range 0 < A < 1
and extract faction part of kA
• Multiply this value my m and round down to nearest integer
h(k ) m (kA mod 1)
– Choice of m is not critical
– Although it works well any value of A, Knuth showed that
it works best with following value of A
51
A = 0.6180339887
2
• Fibonacci hashing
19
1.Given the following input (4322, 1334, 1471, 9679, 1989, 6171,
6173, 4199) and the hash function x mod 10, which of the
following statements are true?
(a) 9679, 1989, 4199 hash to the same value
(b) 1471, 6171 hash to the same value
(c) All elements hash to the same value
(d) Each element hashes to a different value
21
Double Hashing(Another example)
22
Double Hashing
• Double hashing used 2 hash functions
• First hash function: h1(k)
– It gives the position in the hash table where we first
check
• Second hash function: h2(k)
– It determine the offset to check for the next location in
the hash table for the key k when the location given by
the h1(k) is already occupied
– This offset is different for different keys
23
Double Hashing: Insert
• Example: Student record
for a COURSE-A
• m = 10
• h1(k) = k mod 10
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS,
2=EE, 3=MC, 4=IS)):
j – 109, 115, 122, 125, 225,
0 1 2 3 4 5 6 7 8 9 305, 412, 327, 334, 339
T 109
24
Double Hashing: Insert
• Example: Student record
for a COURSE-A
• m = 10
• h1(k) = k mod 10
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS,
2=EE, 3=MC, 4=IS)):
j – 109, 115, 122, 125, 225,
0 1 2 3 4 5 6 7 8 9 305, 412, 327, 334, 339
T 115 109
25
Double Hashing: Insert
• Example: Student record
for a COURSE-A
• m = 10
• h1(k) = k mod 10
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS,
2=EE, 3=MC, 4=IS)):
j – 109, 115, 122, 125, 225,
0 1 2 3 4 5 6 7 8 9 305, 412, 327, 334, 339
T 122 115 109
26
Double Hashing: Insert
• Example: Student record
for a COURSE-A
• m = 10
• h1(k) = k mod 10
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS,
125 125
2=EE, 3=MC, 4=IS)):
j j – 109, 115, 122, 125, 225,
0 1 2 3 4 5 6 7 8 9 305, 412, 327, 334, 339
T 125 122 115 109
27
Double Hashing: Insert
• Example: Student record
for a COURSE-A
• m = 10
• h1(k) = k mod 10
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS,
225 225
2=EE, 3=MC, 4=IS)):
j j – 109, 115, 122, 125, 225,
0 1 2 3 4 5 6 7 8 9 305, 412, 327, 334, 339
T 125 122 115 109
• If m is not prime offset divides the table and there is chance that offset
cycles back 28
Double Hashing: Insert
• Example: Student record
for a COURSE-A
• m = 10
• h1(k) = k mod 10
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS,
225 225
2=EE, 3=MC, 4=IS)):
j j – 109, 115, 122, 125, 225,
0 1 2 3 4 5 6 7 8 9 305, 412, 327, 334, 339
T 125 122 115 109
• The m should be prime - offset never divides the table – It ensures that
hashing looks for every position of the table 29
Double Hashing: Insert
• Example: Student record for
a COURSE-A
• m = 11
• h1(k) = k mod 11
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS, 2=EE,
j
3=MC, 4=IS)):
– 109, 115, 122, 125, 225,
305, 412, 327, 334, 339
0 1 2 3 4 5 6 7 8 9 10
T 109
30
Double Hashing: Insert
• Example: Student record for
a COURSE-A
• m = 11
• h1(k) = k mod 11
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS, 2=EE,
j 3=MC, 4=IS)):
– 109, 115, 122, 125, 225,
305, 412, 327, 334, 339
0 1 2 3 4 5 6 7 8 9 10
T 115 109
31
Double Hashing: Insert
• Example: Student record for
a COURSE-A
• m = 11
• h1(k) = k mod 11
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS, 2=EE,
3=MC, 4=IS)):
– 109, 115, 122, 125, 225,
j
305, 412, 327, 334, 339
0 1 2 3 4 5 6 7 8 9 10
T 122 115 109
32
Double Hashing: Insert
• Example: Student record for
a COURSE-A
• m = 11
• h1(k) = k mod 11
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS, 2=EE,
3=MC, 4=IS)):
– 109, 115, 122, 125, 225,
j 305, 412, 327, 334, 339
0 1 2 3 4 5 6 7 8 9 10
T 122 125 115 109
33
Double Hashing: Insert
• Example: Student record for
a COURSE-A
• m = 11
• h1(k) = k mod 11
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS, 2=EE,
225 225 225 225 3=MC, 4=IS)):
– 109, 115, 122, 125, 225,
j j j j 305, 412, 327, 334, 339
0 1 2 3 4 5 6 7 8 9 10
T 122 125 115 225 109
34
Double Hashing: Insert
• Example: Student record for
a COURSE-A
• m = 11
• h1(k) = k mod 11
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS, 2=EE,
3=MC, 4=IS)):
– 109, 115, 122, 125, 225,
j
305, 412, 327, 334, 339
0 1 2 3 4 5 6 7 8 9 10
T 122 125 115 305 225 109
35
Double Hashing: Insert
• Example: Student record for
a COURSE-A
• m = 11
• h1(k) = k mod 11
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS, 2=EE,
412 412 3=MC, 4=IS)):
– 109, 115, 122, 125, 225,
j j 305, 412, 327, 334, 339
0 1 2 3 4 5 6 7 8 9 10
T 122 125 115 412 305 225 109
36
Double Hashing: Insert
• Example: Student record for
a COURSE-A
• m = 11
• h1(k) = k mod 11
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS, 2=EE,
327 327 3=MC, 4=IS)):
– 109, 115, 122, 125, 225,
j j 305, 412, 327, 334, 339
0 1 2 3 4 5 6 7 8 9 10
T 327 122 125 115 412 305 225 109
37
Double Hashing: Insert
• Example: Student record for
a COURSE-A
• m = 11
• h1(k) = k mod 11
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS, 2=EE,
334 334 334 3=MC, 4=IS)):
– 109, 115, 122, 125, 225,
j j j 305, 412, 327, 334, 339
0 1 2 3 4 5 6 7 8 9 10
T 327 122 125 115 334 412 305 225 109
38
Double Hashing: Insert
• Example: Student record for
a COURSE-A
• m = 11
• h1(k) = k mod 11
• h2(k) = 5 - (k mod 5)
• Keys to be inserted:
CS23BT009, CS23BT015,
CS23BT022, CS23BT025,
EE23BT025, MC23BT005,
IS23BT012, MC23BT027,
MC23BT034, MC23BT039
– Keys in integer form
(obtained from last 2 digits
of the key and a branch
indicator digit (1=CS, 2=EE,
339 339 339 339 339 3=MC, 4=IS)):
– 109, 115, 122, 125, 225,
j j j j j 305, 412, 327, 334, 339
0 1 2 3 4 5 6 7 8 9 10
T 327 122 339 125 115 334 412 305 225 109
39
Double Hashing: Summary
• m should be prime
• Offset is different for different keys
– Offset is always 1 for linear probing
• It does not form clusters as in linear probing
• It distributes the keys more uniformly than linear
probing
• Searching, insert and delete operations are performed
like linear probing
– When an element is deleted, that slot is replaced with a
marker
40
Readings
Read Chapter 11 in Corman book
41