2,2 Hashing

The document provides an overview of hashing techniques, including hash tables, hash functions, and collision handling methods such as Cuckoo Hashing and dynamic hashing. It explains various hashing methods like division, multiplication, universal hashing, folding, and mid-square methods, along with collision resolution techniques like open addressing and chaining. Additionally, it discusses dynamic hashing, which allows for the growth and shrinkage of the hash table as records are added or removed, improving efficiency in data storage and retrieval.

Uploaded by

OML series

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views30 pages

2,2 Hashing

Uploaded by

OML series

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Hashing:

Hashing techniques, Hash table, Hash functions.

Collision handling and Collision resolution
techniques, Cuckoo Hashing. Dynamic Hashing:
Motivation for Dynamic Hashing, Dynamic
Hashing using Directories, directory less
Dynamic Hashing. Bloom Filters Bloom Filter
Design
Hashing
• Hashing is a technique or process of mapping keys, values
into the hash table by using a hash function. It is done for
faster access to elements. The efficiency of mapping
depends on the efficiency of the hash function used.
• Mapping is the way of linking 2 things together, referred to
as “Key” and “Value”.
• Key is an Identifier to uniquely identify the Data(Entity).
• Value – The Actual Entity(and its details) which we want to
store.
Hashing
• Hashing is a way where we can get an Integer
value from any Key. This Key may or may not be an
integer, but after hashing is performed, it will
return an Integer value for any Key.
• Hashing is required as the Key which was given in
the input can not be used as the memory location
to place this key value.
• Hashing in Data Structure is an important method
designed to solve the problem of efficiently
finding and storing data in an array.
Hash Table
• Hashing in data structure uses hash tables to store the
key-value pairs.
• Suppose we want to store some data(i.e. Value) identified
by a unique Key, we can use the Hash Table.
• E.g. If we need to store 3 students with their Roll number as
302, 312, 342, then we need the array of size at least 343,
so that these 3 students are placed
at 302nd, 312th and 342nd Index of that table.
• So instead of directly storing the keys in the hash
table, hashing comes into play, which runs the hash function
converts these big Keys to the Integer value <= size of Array,
this Integer value is called Hash Index
Techniques to Calculate Hashing
• The hash function is just a mathematical
formula that can give a small Integer
value(within the array size) for some big keys.
1. Division Method
2. Multiplication Method
3. Universal Hashing
4. Folding Method
5. Mid Square Method
Division Method
• This is the easiest of all the methods to understand.
• Consider a scenario where we only have 10 cells in Array,
and the Key to insert is 15. i.e. > 10. Then we need some
way to find a method that takes 15 and returns an index
< 10.
Modulus operators can do this pretty easily.
• Division method: HashFunction = Key % size
Here % = Modulus Operator which returns the remainder
when Key is divided by Size.
• E.g. if key=15, 15%̲10=5 and 5<10 so we can insert this Data
with key=15 at the 5th index of the array of size 10.
Multiplication Method
• Similar to Division, it uses the function like:
h(k) = floor( n( kA mod 1 ) )
• Here, k is the key and A can be any constant
value between 0 and 1.
Both k and A are multiplied and their
fractional part is separated using the modulus
operator. This is then multiplied with n to get
the hash value.
Multiplication Method
Example:
• k=123 n=100 A=0.618033 (Has to be between
0 and 1)
h(123) = 100 (123 * 0.618033 mod 1)
= 100 (76.018059 mod 1)
= 100 (0.018059) = 1
• So K with Key=123 can be placed at Index 1.
Universal Hashing
• Universal hashing means that we don’t use a fixed
hash function, but there is a family of hash functions
from which we can randomly pick any hash function.
• These hash functions must obey the basic principle
of Universal hashing that the expected number of
collisions for a particular key x is < 1.
• Once we have such a family of hash functions, then
any hash function can be chosen out of this, at
runtime.
This brings randomness to the hash functions, and
hence it favors the most evenly spread amongst all
other hashing techniques.
Folding Method
• Folding method folds the given key in a way to find such an index that can fit in the
given size of the array.
• By folding, if the array size = 100, that means 2-digit numbers can only fit inside it, so
we fold the given key in parts of 2. i.e if Key=20574 Then we will fold it in parts of 2,
which will be:20, 57, and 4. So to get an index < 100 from these parts, we can sum
these up i.e. 20 + 57 + 4 = 81. So 81 is the index where this Key=20574 can be placed.
• There can be cases where even after folding we get a number >
size e.g. Key=56571 then breaking it down in parts of 2= 56+57+1=114 , we cant
place this Data at index 114 inside the array of size 100, so we can either apply this
algorithm again or can use the Division method(mostly used in such scenarios).
• Similarly, if the array size = 1000, then we can insert 3-digit numbers, so then we can
break Key=20574 as 205+74=279205+74=279
• So Data with Key=20574 can be placed at 279th Index in an array of size 1000.
Mid Square Method
• In this method, basically square the given number and
pick N middle numbers in that squared number, to find the index.
• E.g. If the array size = 100, meaning we can fit only 2 digit
numbers, let’s take N=2 here (N can be taken anything).
• Hash(931)=931^2=866761. Picking N(=2) middle elements i.e. 67,
So 67 can be the potential index to place 93 at.
• If the index obtained from the mid square method is > size of
array, then we can again resort back to the Division method.
• Let’s take N=2 and Array size=11.
Hash(93) = 93^2 = 8649. Picking N(=2) middle elements
i.e. 64 Now since 64>11, so we can apply the Division method to
get 64%11=9.
So 93 can be placed at 9th Index in the array of size 11.
Collision
• The same index can be obtained by running the Hash Function on
different keys.
• For E.g.
• If using the Division method: Let’s say the array size=10
Then Hash(15) = 15%10 = 5
And Hash(25) = 25%10 = 5
So this hash function will return 5 as the index for both of these
keys.
• This is called a Collision, where the resultant hashes of 2 or more
Keys, wrongly map to the same place in the hash table.
The collision creates a problem because each index in a hash table
is supposed to store only one value
Collision resolution techniques
• There are two fundamentals that can be used
to tackle a hash collision:
1. Open Addressing
– Rehashing
– Linear Probing
– Quadratic probing
2. Closed Addressing
– Chaining
Open Addressing
• Open addressing means where we try to
resolve the collision by finding the alternative
index to place the Key at.
– Rehashing
– Linear Probing
– Quadratic probing
Rehashing
• This method invokes 2 hash functions to get rid of collisions. These 2 hash
functions can either be the same or different and the secondary hash
function is applied continuously until an unused/vacant index is found.
• E.g. Let’s take the Division Method as Primary and mid square as a
secondary method.
Array size = 10.Key=15
Hash1(15) = 15%10 = 5, So 15 is placed at the 5th Index.
Key=25
Hash1(25) = 25%10 = 5,Since 5 is already used, it’s a collision.
Hash2(25) = 625. Picking 1 middle element we get 2. So 25 can be
placed at the 2nd Index.
• So as we can see, by using 2 different hash functions, we are able to place
both 15 and 25 at the 5th and 2nd index respectively.
Linear Probing
• When the collision occurs, using the Linear Probing technique, the closest free cell
is found by linearly iterating over the cells. Here, the searching is performed
sequentially, starting from the position where the collision occurs till the empty cell
is not found.
Example- array of size 10, simple hash function hash(K) = (2K+ 1) % size
• Input=2, hash(2) = (2*2+1) % 10 = 5 % 10 = 5
• Input=7, hash(7) = (2*7+1) % 10 = 15 % 10 = 5
• So calculated index of 7 is 5 which is already occupied by another key, i.e. 2. When
linear probing is applied, the nearest empty cell to the index 5 is 6; therefore,
the Key=7 will be added at the index 6.
• Input=12, hash(12) = (2*12+1) % 10 = 25 % 10 = 5
• As the 5th Index is again occupied with Key=2, we can’t place 12 at 5, so we will
again need to linearly probe from 5.
• Similarly, the 6th index is also occupied with Key=7, we look for the next Free
Index.
The 7th index is free, so we can place Key=12 at the 7th Index.
Quadratic probing
• Quadratic probing is similar to linear probing, but instead of
moving one step at a time to find the empty cell to place
this Data at, here the next slot is computed by adding the
successive value of an arbitrary polynomial in the original
hashed index. To put it simply, it means we can move in
a Quadratic fashion.
• E.g. From 5 we can go to 5+12 = 6
Then from 6, the next slot to check can be 6 + 22 = 10.
And then next from 10 would be 10 + 32 = 19.
So as we can see, we are moving from 5 → 6 → 10 → 19
Versus the 5 → 6 → 7 → 8 in case of Linear Probing.
Closed Addressing
• Contrary to the Open addressing, where we
found an alternative index incase of collision,
here we place it at that index only using the
Chaining technique.
– Chaining
Chaining
• The chaining method builds a Linked list of items whose key
hashes to the same value. This method requires an extra link field
to each table position.
So if we compare it against Rehashing, it only uses 1 hash function,
but how the Data is stored is different.
Example- Array size = 10. Key=15, Hash(15) = 15%10 = 5
So 15 is placed at the 5th Index.
Key=25, Hash(25) = 25%10 = 5, Since 5 is already used, it’s a
collision.
• But using the Chaining mechanism, we can place both 15 and 25 at
the same 5th Index, by chaining 15 to 25.
• So the 5th Index of Array will contain a Linked List of 15 → 25.
Chaining
Cuckoo Hashing
• Cuckoo Hashing is a technique for implementing a
hash table. As opposed to most other hash tables,
it achieves constant time worst-case complexity
for lookups.
• Collisions are handled by evicting existing keys and
moving them from one array to the other. This
resembles the way a cuckoo chick pushes out an
egg from the nest to make room for itself, hence
the name Cuckoo Hashing.
Cuckoo Hashing
• Representation-It is implemented using two arrays of equal size and two
hash functions.
• Insertion- a new element is always inserted in the first hash table. Should
a collision occur, the existing element is kicked out and inserted in the
second hash table. Should that in turn cause a collision, the second
existing element will be kicked out and inserted in the first hash table, and
so on. This continues until an empty bucket is found.
• Lookup- if a key exists, it will be stored in its original bucket, either in the
first array or the second one. In other words, at most two lookups are
needed to figure out if the key exists or not, thus the worst-case
complexity is O(1).
• Removal- removal is done simply by clearing the bucket storing the key.
Again, worst-case complexity is O(1).
Cuckoo Hashing
• Illustration-
Input: {20, 50, 53, 75, 100, 67, 105, 3, 36, 39}
Hash Functions: h1(key) = key%11
h2(key) = (key/11)%11
Cuckoo Hashing
• Let’s start by inserting 20 at its possible
position in the first table determined by
h1(20):

• Next: 50
Cuckoo Hashing
• Next: 53. h1(53) = 9. But 20 is already there at 9.
We place 53 in table 1 & 20 in table 2 at h2(20)

• Next: 75. h1(75) = 9. But 53 is already there at 9.

We place 75 in table 1 & 53 in table 2 at h2(53)
Cuckoo Hashing
• Next: 100. h1(100) = 1.

• Next: 67. h1(67) = 1. But 100 is already there

at 1. We place 67 in table 1 & 100 in table 2
Cuckoo Hashing
• Next: 105. h1(105) = 6. But 50 is already there
at 6. We place 105 in table 1 & 50 in table 2 at
h2(50) = 4. Now 53 has been displaced. h1(53)
= 9. 75 displaced: h2(75) = 6

• Next: 3. h1(3) = 3.
Cuckoo Hashing
• Next: 36. h1(36) = 3. h2(3) = 0.

• Next: 39. h1(39) = 6. h2(105) = 9. h1(100) = 1. h2(67)

= 6. h1(75) = 9. h2(53) = 4. h1(50) = 6. h2(39)=3.
Dynamic hashing
• Dynamic hashing is a method of hashing in which the data structure grows and
shrinks dynamically as records are added or removed.
• In traditional static hashing, the hash function maps keys to a fixed number of
buckets or slots. However, this approach can lead to problems such as overflow
and poor distribution of keys, especially when the number of keys is unpredictable
or changes over time.
• Dynamic hashing, also known as extendible hashing, addresses these issues by
allowing the hash table to expand or contract as needed.
• The key to dynamic hashing is the use of a directory that points to buckets. Each
bucket can hold a certain number of records. When a bucket becomes full upon an
insertion, it is split into two, and the directory is updated to reflect this change.
• The hash function in dynamic hashing is designed to produce a binary string of
digits. The system initially considers only the first few digits but can consider more
digits as the table grows, effectively increasing the number of available buckets.
Dynamic hashing
• For example, if the system starts with one bucket (represented by 0) and this
bucket becomes full, it is split into two buckets represented by 0 and 1. If the
bucket represented by 1 becomes full, it is split into buckets represented by 10 and
11, and so on. This way, the hash table can grow incrementally without needing to
rehash all existing keys, which can be a costly operation.
• Similarly, when a deletion causes a bucket to become empty, it can be merged with
another bucket, and the directory can be updated to reflect this change. This
allows the hash table to shrink when necessary, saving memory.
• Dynamic hashing provides a flexible and efficient method for managing hash tables
with a changing number of records. It avoids the problems of overflow and poor
key distribution that can occur with static hashing, and it eliminates the need for
costly rehashing operations.

DSA - Unit 1
No ratings yet
DSA - Unit 1
43 pages
SORTING PROGRAMS - Counting + Bucket + Heap
No ratings yet
SORTING PROGRAMS - Counting + Bucket + Heap
27 pages
Module 5-Hashing and Collision
No ratings yet
Module 5-Hashing and Collision
51 pages
Hashing
No ratings yet
Hashing
48 pages
Hashing
No ratings yet
Hashing
44 pages
Hashing
No ratings yet
Hashing
25 pages
Hashing
No ratings yet
Hashing
41 pages
HAshing (ISE Department)
No ratings yet
HAshing (ISE Department)
31 pages
(Ebook) Python Distilled (Developer's Library) (1st Edition) by David Beazley ISBN 9780134173276, 0134173279
100% (1)
(Ebook) Python Distilled (Developer's Library) (1st Edition) by David Beazley ISBN 9780134173276, 0134173279
85 pages
Hashing
No ratings yet
Hashing
30 pages
Hashing
No ratings yet
Hashing
23 pages
09 Hashtable
No ratings yet
09 Hashtable
53 pages
Unit-6c DBMS - Hashing
No ratings yet
Unit-6c DBMS - Hashing
21 pages
HASHING
No ratings yet
HASHING
63 pages
AIT307 IAI Unit2 1
No ratings yet
AIT307 IAI Unit2 1
60 pages
Module 5 Hashing
No ratings yet
Module 5 Hashing
66 pages
Hashing v2 12032018
No ratings yet
Hashing v2 12032018
23 pages
Hashing Methods
No ratings yet
Hashing Methods
20 pages
HAshing (Satish Sir)
No ratings yet
HAshing (Satish Sir)
52 pages
Unit 5
No ratings yet
Unit 5
50 pages
What Is Hashing
No ratings yet
What Is Hashing
11 pages
Hash Table: Didih Rizki Chandranegara
No ratings yet
Hash Table: Didih Rizki Chandranegara
33 pages
UNIT V - Hashing
No ratings yet
UNIT V - Hashing
20 pages
Hashing
No ratings yet
Hashing
20 pages
Hash
No ratings yet
Hash
7 pages
Mordechai Ben-Ari. Principles of Concurrent and Distributed Programming
No ratings yet
Mordechai Ben-Ari. Principles of Concurrent and Distributed Programming
363 pages
Modue 5
No ratings yet
Modue 5
10 pages
Hashing
No ratings yet
Hashing
30 pages
CPP-Simple Programs With Output
No ratings yet
CPP-Simple Programs With Output
9 pages
DS Lecture 01.1 Fall-24-35
No ratings yet
DS Lecture 01.1 Fall-24-35
20 pages
Algo Cha 8
No ratings yet
Algo Cha 8
20 pages
Module 5: HASHING: Functions. The Values Are Then Stored in A Data Structure Called Hash Table
No ratings yet
Module 5: HASHING: Functions. The Values Are Then Stored in A Data Structure Called Hash Table
39 pages
Module 5
No ratings yet
Module 5
33 pages
Lecture 08 - Hash Tables
No ratings yet
Lecture 08 - Hash Tables
21 pages
Computer Science Paper 1 HL
No ratings yet
Computer Science Paper 1 HL
6 pages
Oops Complete Notes
No ratings yet
Oops Complete Notes
55 pages
Hashing
No ratings yet
Hashing
20 pages
Unit 5 Session 5 Hashing
No ratings yet
Unit 5 Session 5 Hashing
20 pages
Hashing
No ratings yet
Hashing
23 pages
Hashing Techniques
No ratings yet
Hashing Techniques
13 pages
Hashing
No ratings yet
Hashing
23 pages
Eiffel
No ratings yet
Eiffel
20 pages
AI Module 1, 2 Notes 4 TH Sem Updated
No ratings yet
AI Module 1, 2 Notes 4 TH Sem Updated
41 pages
Ch7 Hashing
No ratings yet
Ch7 Hashing
12 pages
Chapter One - Hashing PDF
No ratings yet
Chapter One - Hashing PDF
30 pages
DS 5
No ratings yet
DS 5
23 pages
Hashing New
No ratings yet
Hashing New
48 pages
Hashing PDF
No ratings yet
Hashing PDF
56 pages
Hashing
No ratings yet
Hashing
56 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
26 pages
Week13 1
No ratings yet
Week13 1
16 pages
3 Hashing
No ratings yet
3 Hashing
20 pages
Construction Methods Project Management Network Fundamentals With Assign.02
No ratings yet
Construction Methods Project Management Network Fundamentals With Assign.02
29 pages
Hashing Algorithms
No ratings yet
Hashing Algorithms
22 pages
2024 Ijor-236801 Tafpv
No ratings yet
2024 Ijor-236801 Tafpv
15 pages
Tree Operations
No ratings yet
Tree Operations
21 pages
1.dfa Nfa
No ratings yet
1.dfa Nfa
32 pages
HEX Editor Guide
No ratings yet
HEX Editor Guide
5 pages
AVL Tree
No ratings yet
AVL Tree
19 pages
Oops Journal
No ratings yet
Oops Journal
75 pages
Lung Cancer Detection Report
No ratings yet
Lung Cancer Detection Report
22 pages
Lab 2
No ratings yet
Lab 2
10 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
32 pages
INSTRUCTIONS of 8086
No ratings yet
INSTRUCTIONS of 8086
36 pages
Unit-5 2
No ratings yet
Unit-5 2
9 pages
Ads La2 - Binary Tree Node Functions
No ratings yet
Ads La2 - Binary Tree Node Functions
9 pages
Hashing
No ratings yet
Hashing
34 pages
Whitepaper
No ratings yet
Whitepaper
14 pages
Hashing and Graphs
No ratings yet
Hashing and Graphs
28 pages
Ads La10 - Heap Data Structure
No ratings yet
Ads La10 - Heap Data Structure
9 pages
Dsa 5
No ratings yet
Dsa 5
22 pages
Hashing PPT For Student
No ratings yet
Hashing PPT For Student
53 pages
1111 23 Regex
No ratings yet
1111 23 Regex
17 pages
Variables Blocks & Examples in Block Coding & PictoBlox
No ratings yet
Variables Blocks & Examples in Block Coding & PictoBlox
7 pages
Ads La9 - Hashing Using Chaining
No ratings yet
Ads La9 - Hashing Using Chaining
4 pages
Hashing ClassNotes
No ratings yet
Hashing ClassNotes
8 pages
BCS304-DSA Notes M-5
100% (1)
BCS304-DSA Notes M-5
22 pages
DS Lecture - 6 (Hashing)
No ratings yet
DS Lecture - 6 (Hashing)
27 pages
OOPS Concepts
No ratings yet
OOPS Concepts
15 pages
Lecture 16 Exception Handling
No ratings yet
Lecture 16 Exception Handling
7 pages
CO4 - Hashing in Data Structure
No ratings yet
CO4 - Hashing in Data Structure
13 pages
Assignment 3 - Translator
No ratings yet
Assignment 3 - Translator
6 pages
Dsa Module 6 Ktuassist
No ratings yet
Dsa Module 6 Ktuassist
9 pages
DAA Lab Manual (New Format)
No ratings yet
DAA Lab Manual (New Format)
41 pages
TCS Prime Interview Experience
No ratings yet
TCS Prime Interview Experience
15 pages
Snake and Ladder Code
No ratings yet
Snake and Ladder Code
3 pages
MATH 1280 - Written Assignment Unit 2
No ratings yet
MATH 1280 - Written Assignment Unit 2
8 pages
Encoding Three Strings (5 Test Cases)
50% (2)
Encoding Three Strings (5 Test Cases)
20 pages
Sahil Kuwadia 1
No ratings yet
Sahil Kuwadia 1
2 pages
Poster On Quantum Computing
No ratings yet
Poster On Quantum Computing
1 page
Heap Data Structure - Notes
No ratings yet
Heap Data Structure - Notes
4 pages
Indices Classwork: G8 - Law of Indices - Introduction 2022
No ratings yet
Indices Classwork: G8 - Law of Indices - Introduction 2022
1 page
2.3.3. Sorting Algorithms-3
No ratings yet
2.3.3. Sorting Algorithms-3
4 pages
Basic Math Notes
From Everand
Basic Math Notes
Ernest Bywater
5/5 (2)
Learn Programming Using C#
From Everand
Learn Programming Using C#
Taurius Litvinavicius
No ratings yet
Hashing
From Everand
Hashing
Prakash Hegade
No ratings yet

2,2 Hashing

Uploaded by

2,2 Hashing

Uploaded by

Hashing:

Hashing techniques, Hash table, Hash functions.

• Next: 75. h1(75) = 9. But 53 is already there at 9.

• Next: 67. h1(67) = 1. But 100 is already there

• Next: 39. h1(39) = 6. h2(105) = 9. h1(100) = 1. h2(67)

You might also like