[go: up one dir, main page]

0% found this document useful (0 votes)
2 views30 pages

2,2 Hashing

The document provides an overview of hashing techniques, including hash tables, hash functions, and collision handling methods such as Cuckoo Hashing and dynamic hashing. It explains various hashing methods like division, multiplication, universal hashing, folding, and mid-square methods, along with collision resolution techniques like open addressing and chaining. Additionally, it discusses dynamic hashing, which allows for the growth and shrinkage of the hash table as records are added or removed, improving efficiency in data storage and retrieval.

Uploaded by

OML series
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views30 pages

2,2 Hashing

The document provides an overview of hashing techniques, including hash tables, hash functions, and collision handling methods such as Cuckoo Hashing and dynamic hashing. It explains various hashing methods like division, multiplication, universal hashing, folding, and mid-square methods, along with collision resolution techniques like open addressing and chaining. Additionally, it discusses dynamic hashing, which allows for the growth and shrinkage of the hash table as records are added or removed, improving efficiency in data storage and retrieval.

Uploaded by

OML series
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Hashing:

Hashing techniques, Hash table, Hash functions.


Collision handling and Collision resolution
techniques, Cuckoo Hashing. Dynamic Hashing:
Motivation for Dynamic Hashing, Dynamic
Hashing using Directories, directory less
Dynamic Hashing. Bloom Filters Bloom Filter
Design
Hashing
• Hashing is a technique or process of mapping keys, values
into the hash table by using a hash function. It is done for
faster access to elements. The efficiency of mapping
depends on the efficiency of the hash function used.
• Mapping is the way of linking 2 things together, referred to
as “Key” and “Value”.
• Key is an Identifier to uniquely identify the Data(Entity).
• Value – The Actual Entity(and its details) which we want to
store.
Hashing
• Hashing is a way where we can get an Integer
value from any Key. This Key may or may not be an
integer, but after hashing is performed, it will
return an Integer value for any Key.
• Hashing is required as the Key which was given in
the input can not be used as the memory location
to place this key value.
• Hashing in Data Structure is an important method
designed to solve the problem of efficiently
finding and storing data in an array.
Hash Table
• Hashing in data structure uses hash tables to store the
key-value pairs.
• Suppose we want to store some data(i.e. Value) identified
by a unique Key, we can use the Hash Table.
• E.g. If we need to store 3 students with their Roll number as
302, 312, 342, then we need the array of size at least 343,
so that these 3 students are placed
at 302nd, 312th and 342nd Index of that table.
• So instead of directly storing the keys in the hash
table, hashing comes into play, which runs the hash function
converts these big Keys to the Integer value <= size of Array,
this Integer value is called Hash Index
Techniques to Calculate Hashing
• The hash function is just a mathematical
formula that can give a small Integer
value(within the array size) for some big keys.
1. Division Method
2. Multiplication Method
3. Universal Hashing
4. Folding Method
5. Mid Square Method
Division Method
• This is the easiest of all the methods to understand.
• Consider a scenario where we only have 10 cells in Array,
and the Key to insert is 15. i.e. > 10. Then we need some
way to find a method that takes 15 and returns an index
< 10.
Modulus operators can do this pretty easily.
• Division method: HashFunction = Key % size
Here % = Modulus Operator which returns the remainder
when Key is divided by Size.
• E.g. if key=15, 15%̲10=5 and 5<10 so we can insert this Data
with key=15 at the 5th index of the array of size 10.
Multiplication Method
• Similar to Division, it uses the function like:
h(k) = floor( n( kA mod 1 ) )
• Here, k is the key and A can be any constant
value between 0 and 1.
Both k and A are multiplied and their
fractional part is separated using the modulus
operator. This is then multiplied with n to get
the hash value.
Multiplication Method
Example:
• k=123 n=100 A=0.618033 (Has to be between
0 and 1)
h(123) = 100 (123 * 0.618033 mod 1)
= 100 (76.018059 mod 1)
= 100 (0.018059) = 1
• So K with Key=123 can be placed at Index 1.
Universal Hashing
• Universal hashing means that we don’t use a fixed
hash function, but there is a family of hash functions
from which we can randomly pick any hash function.
• These hash functions must obey the basic principle
of Universal hashing that the expected number of
collisions for a particular key x is < 1.
• Once we have such a family of hash functions, then
any hash function can be chosen out of this, at
runtime.
This brings randomness to the hash functions, and
hence it favors the most evenly spread amongst all
other hashing techniques.
Folding Method
• Folding method folds the given key in a way to find such an index that can fit in the
given size of the array.
• By folding, if the array size = 100, that means 2-digit numbers can only fit inside it, so
we fold the given key in parts of 2. i.e if Key=20574 Then we will fold it in parts of 2,
which will be:20, 57, and 4. So to get an index < 100 from these parts, we can sum
these up i.e. 20 + 57 + 4 = 81. So 81 is the index where this Key=20574 can be placed.
• There can be cases where even after folding we get a number >
size e.g. Key=56571 then breaking it down in parts of 2= 56+57+1=114 , we cant
place this Data at index 114 inside the array of size 100, so we can either apply this
algorithm again or can use the Division method(mostly used in such scenarios).
• Similarly, if the array size = 1000, then we can insert 3-digit numbers, so then we can
break Key=20574 as 205+74=279205+74=279
• So Data with Key=20574 can be placed at 279th Index in an array of size 1000.
Mid Square Method
• In this method, basically square the given number and
pick N middle numbers in that squared number, to find the index.
• E.g. If the array size = 100, meaning we can fit only 2 digit
numbers, let’s take N=2 here (N can be taken anything).
• Hash(931)=931^2=866761. Picking N(=2) middle elements i.e. 67,
So 67 can be the potential index to place 93 at.
• If the index obtained from the mid square method is > size of
array, then we can again resort back to the Division method.
• Let’s take N=2 and Array size=11.
Hash(93) = 93^2 = 8649. Picking N(=2) middle elements
i.e. 64 Now since 64>11, so we can apply the Division method to
get 64%11=9.
So 93 can be placed at 9th Index in the array of size 11.
Collision
• The same index can be obtained by running the Hash Function on
different keys.
• For E.g.
• If using the Division method: Let’s say the array size=10
Then Hash(15) = 15%10 = 5
And Hash(25) = 25%10 = 5
So this hash function will return 5 as the index for both of these
keys.
• This is called a Collision, where the resultant hashes of 2 or more
Keys, wrongly map to the same place in the hash table.
The collision creates a problem because each index in a hash table
is supposed to store only one value
Collision resolution techniques
• There are two fundamentals that can be used
to tackle a hash collision:
1. Open Addressing
– Rehashing
– Linear Probing
– Quadratic probing
2. Closed Addressing
– Chaining
Open Addressing
• Open addressing means where we try to
resolve the collision by finding the alternative
index to place the Key at.
– Rehashing
– Linear Probing
– Quadratic probing
Rehashing
• This method invokes 2 hash functions to get rid of collisions. These 2 hash
functions can either be the same or different and the secondary hash
function is applied continuously until an unused/vacant index is found.
• E.g. Let’s take the Division Method as Primary and mid square as a
secondary method.
Array size = 10.Key=15
Hash1(15) = 15%10 = 5, So 15 is placed at the 5th Index.
Key=25
Hash1(25) = 25%10 = 5,Since 5 is already used, it’s a collision.
Hash2(25) = 625. Picking 1 middle element we get 2. So 25 can be
placed at the 2nd Index.
• So as we can see, by using 2 different hash functions, we are able to place
both 15 and 25 at the 5th and 2nd index respectively.
Linear Probing
• When the collision occurs, using the Linear Probing technique, the closest free cell
is found by linearly iterating over the cells. Here, the searching is performed
sequentially, starting from the position where the collision occurs till the empty cell
is not found.
Example- array of size 10, simple hash function hash(K) = (2K+ 1) % size
• Input=2, hash(2) = (2*2+1) % 10 = 5 % 10 = 5
• Input=7, hash(7) = (2*7+1) % 10 = 15 % 10 = 5
• So calculated index of 7 is 5 which is already occupied by another key, i.e. 2. When
linear probing is applied, the nearest empty cell to the index 5 is 6; therefore,
the Key=7 will be added at the index 6.
• Input=12, hash(12) = (2*12+1) % 10 = 25 % 10 = 5
• As the 5th Index is again occupied with Key=2, we can’t place 12 at 5, so we will
again need to linearly probe from 5.
• Similarly, the 6th index is also occupied with Key=7, we look for the next Free
Index.
The 7th index is free, so we can place Key=12 at the 7th Index.
Quadratic probing
• Quadratic probing is similar to linear probing, but instead of
moving one step at a time to find the empty cell to place
this Data at, here the next slot is computed by adding the
successive value of an arbitrary polynomial in the original
hashed index. To put it simply, it means we can move in
a Quadratic fashion.
• E.g. From 5 we can go to 5+12 = 6
Then from 6, the next slot to check can be 6 + 22 = 10.
And then next from 10 would be 10 + 32 = 19.
So as we can see, we are moving from 5 → 6 → 10 → 19
Versus the 5 → 6 → 7 → 8 in case of Linear Probing.
Closed Addressing
• Contrary to the Open addressing, where we
found an alternative index incase of collision,
here we place it at that index only using the
Chaining technique.
– Chaining
Chaining
• The chaining method builds a Linked list of items whose key
hashes to the same value. This method requires an extra link field
to each table position.
So if we compare it against Rehashing, it only uses 1 hash function,
but how the Data is stored is different.
Example- Array size = 10. Key=15, Hash(15) = 15%10 = 5
So 15 is placed at the 5th Index.
Key=25, Hash(25) = 25%10 = 5, Since 5 is already used, it’s a
collision.
• But using the Chaining mechanism, we can place both 15 and 25 at
the same 5th Index, by chaining 15 to 25.
• So the 5th Index of Array will contain a Linked List of 15 → 25.
Chaining
Cuckoo Hashing
• Cuckoo Hashing is a technique for implementing a
hash table. As opposed to most other hash tables,
it achieves constant time worst-case complexity
for lookups.
• Collisions are handled by evicting existing keys and
moving them from one array to the other. This
resembles the way a cuckoo chick pushes out an
egg from the nest to make room for itself, hence
the name Cuckoo Hashing.
Cuckoo Hashing
• Representation-It is implemented using two arrays of equal size and two
hash functions.
• Insertion- a new element is always inserted in the first hash table. Should
a collision occur, the existing element is kicked out and inserted in the
second hash table. Should that in turn cause a collision, the second
existing element will be kicked out and inserted in the first hash table, and
so on. This continues until an empty bucket is found.
• Lookup- if a key exists, it will be stored in its original bucket, either in the
first array or the second one. In other words, at most two lookups are
needed to figure out if the key exists or not, thus the worst-case
complexity is O(1).
• Removal- removal is done simply by clearing the bucket storing the key.
Again, worst-case complexity is O(1).
Cuckoo Hashing
• Illustration-
Input: {20, 50, 53, 75, 100, 67, 105, 3, 36, 39}
Hash Functions: h1(key) = key%11
h2(key) = (key/11)%11
Cuckoo Hashing
• Let’s start by inserting 20 at its possible
position in the first table determined by
h1(20):

• Next: 50
Cuckoo Hashing
• Next: 53. h1(53) = 9. But 20 is already there at 9.
We place 53 in table 1 & 20 in table 2 at h2(20)

• Next: 75. h1(75) = 9. But 53 is already there at 9.


We place 75 in table 1 & 53 in table 2 at h2(53)
Cuckoo Hashing
• Next: 100. h1(100) = 1.

• Next: 67. h1(67) = 1. But 100 is already there


at 1. We place 67 in table 1 & 100 in table 2
Cuckoo Hashing
• Next: 105. h1(105) = 6. But 50 is already there
at 6. We place 105 in table 1 & 50 in table 2 at
h2(50) = 4. Now 53 has been displaced. h1(53)
= 9. 75 displaced: h2(75) = 6

• Next: 3. h1(3) = 3.
Cuckoo Hashing
• Next: 36. h1(36) = 3. h2(3) = 0.

• Next: 39. h1(39) = 6. h2(105) = 9. h1(100) = 1. h2(67)


= 6. h1(75) = 9. h2(53) = 4. h1(50) = 6. h2(39)=3.
Dynamic hashing
• Dynamic hashing is a method of hashing in which the data structure grows and
shrinks dynamically as records are added or removed.
• In traditional static hashing, the hash function maps keys to a fixed number of
buckets or slots. However, this approach can lead to problems such as overflow
and poor distribution of keys, especially when the number of keys is unpredictable
or changes over time.
• Dynamic hashing, also known as extendible hashing, addresses these issues by
allowing the hash table to expand or contract as needed.
• The key to dynamic hashing is the use of a directory that points to buckets. Each
bucket can hold a certain number of records. When a bucket becomes full upon an
insertion, it is split into two, and the directory is updated to reflect this change.
• The hash function in dynamic hashing is designed to produce a binary string of
digits. The system initially considers only the first few digits but can consider more
digits as the table grows, effectively increasing the number of available buckets.
Dynamic hashing
• For example, if the system starts with one bucket (represented by 0) and this
bucket becomes full, it is split into two buckets represented by 0 and 1. If the
bucket represented by 1 becomes full, it is split into buckets represented by 10 and
11, and so on. This way, the hash table can grow incrementally without needing to
rehash all existing keys, which can be a costly operation.
• Similarly, when a deletion causes a bucket to become empty, it can be merged with
another bucket, and the directory can be updated to reflect this change. This
allows the hash table to shrink when necessary, saving memory.
• Dynamic hashing provides a flexible and efficient method for managing hash tables
with a changing number of records. It avoids the problems of overflow and poor
key distribution that can occur with static hashing, and it eliminates the need for
costly rehashing operations.

You might also like