Unit 1 Dsa Hashing 2024 1
Unit 1 Dsa Hashing 2024 1
Prepared By
Prof. Anand N. Gharu
(Assistant Professor)
Computer Dept.
CLASS : SE COMPUTER 2019 15 January 2024
SUBJECT : DSA (SEM-II)Note: The material to prepare this presentation has been taken from internet and are generated only
1
.UNIT :I .
for students reference and not for commercial use
SYLLABUS
SYLLABUS
Hash Table- Concepts-hash table, hash function, basic operations,
bucket, collision, probe, synonym, overflow, open hashing, closed
hashing, perfect hash function, load density, full table, load factor,
rehashing, issues in hashing, hash functions- properties of good
hash function, division, multiplication, extraction, mid-square,
folding and universal, Collision resolution strategies- open
addressing and chaining, Hash table overflow- open addressing and
chaining, extendible hashing, closed addressing and separate
chaining.
Skip List- representation, searching and operations- insertion,
removal
INTRODUCTION
UNIT-I
HASHING
INTRODUCTION
1. Hashing is finding an address where the data is
to be stored as well as located using a key with
the help of the algorithmic function.
Hash(key) Address
Advantage-
1. Hashing is extremely efficient.
2. The time taken by it to perform the search does not depend upon
the total number of elements.
3. It completes the search with constant time complexity O(1).
Hashing
• Hashing Mechanism -
1. An array data structure called as Hash table is used to store the data
items.
2. Based on the hash key value, data items are inserted into the hash table.
1. Databases
2. Associative arrays
3. Sets
4. Memory cache
CHARACTRISTICS OF HASH FUNCTION
a) Easy to compute : It should be easy to compute and must not become an
algorithm in itself.
similar strings.
HASH FUNCTION
• A function that maps a key into the range [0 to Max − 1], the
result of which is used as an index (or address) to hash table for
storing and retrieving record
• In Hashing technique, the hash table and hash function are used.
Using the hash function, we can calculate the address at which the
value can be stored.
HASHING/HASH FUNCTION
• The main idea behind the hashing is to create the (key/value)
pairs. If the key is given, then the algorithm computes the index at
which the value would be stored. It can be written as:
• Index = hash(key)
TYPES OF HASH FUNCTION
a) Division Method
b) Multiplication Method
c) Extraction Method
e) Folding
f) Universal Method
TYPES OF HASH FUNCTION
There are three ways of calculating the hash function:
1. Division method
2. Folding method
3. Mid square method
In the division method, the hash function can be defined as:
h(ki) = ki % m;
where m is the size of the hash table.
For example, if the key value is 6 and the size of the hash table is 10.
When we apply the hash function to key 6 then the index would be:
h(6) = 6%10 = 6
The index is 6 at which the value is stored.
TYPES OF HASH FUNCTION
1. Division Method:
This is the most simple and easiest method to generate a hash value. The
hash function divides the value k by M and then uses the remainder
obtained.
Example:
k = 12345
Formula: M = 95
h(12345) = 12345 mod 95
h(K) = k mod M = 90
k = 1276
Here,
M = 11
h(1276) = 1276 mod 11
k is the key value, and
= 0
M is the size of the hash table.
TYPES OF HASH FUNCTION
2. The mid square method is a very good hashing method. It
involves two steps to compute the hash value-
Square the value of the key k i.e. k2
Extract the middle r digits as the hash value.
Formula:
Example:
Suppose the hash table has 100 memory locations.
So r = 2 because two digits are required to map the
h(K) = h(k x k) key to the memory location.
k = 60
Here, k x k = 60 x 60
= 3600 (mid 60 for k=60)
k is the key value. h(60) = 60
The hash value obtained is 60
TYPES OF HASH FUNCTION
3. Digit Folding Method : This method involves two steps:
Divide the key-value k into a number of parts i.e. k1, k2, k3,….,kn, where each
part has the same number of digits except for the last part that can have lesser
digits than the other parts.
Add the individual parts. The hash value is obtained by ignoring the last carry if
any.
Formula:
Here,
s is obtained by adding the parts of the key k
TYPES OF HASH FUNCTION
3. Digit Folding Method :
Example:
k = 12345
k1 = 12, k2 = 34, k3 = 5
s = k1 + k2 + k3
= 12 + 34 + 5
= 51
h(K) = 51
TYPES OF HASH FUNCTION
4. Multiplication method :
This method involves the following steps:
Here,
M is the size of the hash table.
k is the key value.
A is a constant value.
Where "k A mod 1" means the fractional part of k A, that is, k A -⌊k
A⌋.
TYPES OF HASH FUNCTION
4. Multiplication method :
Example:
k = 12345
A = 0.357840
M = 100
h(26) = 26%10 = 6
Therefore, two values are stored at the same index, i.e., 6, and this leads to the
collision problem. To resolve these collisions, we have some techniques known
as collision techniques.
1. This technique creates a linked list to the slot for which collision
occurs.
2. The new key is then inserted in the linked list.
3. These linked lists to the slots appear like chains.
4. That is why, this technique is called as separate chaining
Collision Resolution Techniques
For Searching-
• In worst case, all the keys might map to the same bucket of the
hash table.
• In such a case, all the keys will be present in a single linked list.
• Sequential search will have to be performed on the linked list to
perform the search.
• So, time taken for searching in worst case is O(n).
For Deletion-
• In worst case, the key might have to be searched first and then
deleted.
• In worst case, time taken for searching is O(n).
• So, time taken for deletion in worst case is O(n).
Collision Resolution Techniques
Advantages:
1) Simple to implement.
2) Hash table never fills up, we can always add more elements to the
chain.
3) Less sensitive to the hash function or load factors.
4) It is mostly used when it is unknown how many and how
frequently keys may be inserted or deleted. .
Collision Resolution Techniques
Disadvantages:
1) Cache performance of chaining is not good as keys are stored
using a linked list. Open addressing provides better cache
performance as everything is stored in the same table.
2) Wastage of Space (Some Parts of hash table are never used)
3) If the chain becomes long, then search time can become O(n) in
the worst case.
4) Uses extra space for links
Collision Resolution Techniques
Using the hash function ‘key mod 7’, insert the following
sequence of keys in the hash table-
50, 700, 76, 85, 92, 73 and 101.
Use separate chaining technique for collision resolution.
Step-01:
1. Unlike separate chaining, all the keys are stored inside the hash
table.
2. No key is stored outside the hash table.
1. Linear Probing
2. Quadratic Probing
3. Double Hashing
Open Addressing
Operations In open addressing,
1. Insert Operation-
• Hash function is used to compute the hash value for a key to be inserted.
• Hash value is then used as an index to store the key in the hash table.
In case of collision,
• Probing is performed until an empty bucket is found.
• Once an empty bucket is found, the key is inserted.
• Probing is performed in accordance with the technique used for open
addressing.
Open Addressing
Operations In open addressing,
Search Operation-
Search Operation-
Advantage-
1. It is easy to compute.
Disadvantage-
1. The main problem with linear probing is clustering.
2. Many consecutive elements form groups.
3. Then, it takes time to search an element or to find an empty bucket.
Open Addressing Techniques
1. Linear Probing -
Let us consider a simple hash function as “key mod 7” and a sequence of
keys as 50, 700, 76, 85, 92, 73, 101.
CLOSED HASHING
Linear Probing
Linear probing is one of the forms of open addressing. As we know
that each cell in the hash table contains a key-value pair, so when the
collision occurs by mapping a new key to the cell already
occupied by another key, then linear probing technique searches
for the closest free locations and adds a new key to that empty
cell. In this case, searching is performed sequentially, starting from
the position where the collision occurs till the empty cell is not
found.
CLOSED HASHING
Let's understand the linear probing through an example.
Consider the above example for the linear probing:
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5 respectively.
The calculated index value of 11 is 5 which is already occupied by another
key value, i.e., 6. When linear probing is applied, the nearest empty cell to
the index 5 is 6; therefore, the value 11 will be added at the index 6.
The next key value is 13. The index value associated with this key value is
9 when hash function is applied. The cell is already filled at index 9. When
linear probing is applied, the nearest empty cell to the index 9 is 0;
therefore, the value 13 will be added at the index 0
LINEAR PROBING
Let us consider a simple hash function as “key mod 7” and a
sequence of keys as 50, 700, 76, 85, 92, 73, 101.
LINEAR PROBING
Let us consider a simple hash function as “key mod 30” and a
sequence of keys as 3, 1, 63, 5, 11, 15, 18, 16, 46.
Chaining
( with /without
replacement)
Chaining Without Replacement
In collision handling method chaining is a concept which introduces
an additional field with data i.e. chain. A separate chain table is
maintained for colliding data. When collision occurs, we store the
We can put some other quadratic equations also using some constants
The value of i = 0, 1, . . ., m – 1. So we start from i = 0, and increase this
until we get one free space. So initially when i = 0, then the h(x, i) is same
as h´(x).
CLOSED HASHING
CLOSED HASHING
Let us consider a simple hash function as “key mod 7”
and sequence of keys as 50, 700, 76, 85, 92, 73, 101
h1(n)=n%20
h2(n)=n%13
In such situations, we have to transfer entries from old table to the new
table by re computing their positions using hash functions
Rehashing
Rehashing
As the name suggests, rehashing means hashing again.
Basically, when the load factor increases to more than its pre-
defined value (default value of load factor is 0.75), the
complexity increases. So to overcome this, the size of the
array is increased (doubled) and all the values are hashed
again and stored in the new double sized array to maintain a
low load factor and low complexity.
Rehashing
Why rehashing?
with the data. There will not be any unused memory lying.
3. This method is good for the dynamic database where data grows
also increased.
2. In this case, the bucket overflow situation will also occur. But it
might take little time to reach this situation than static hashing.
Skip List
1. A skip list is a probabilistic data structure.
very quickly.
elements.
Skip List
Skip list structure
list, and the top layers of the skip list are like an "express line"
Email : gharu.anand@gmail.com