[go: up one dir, main page]

0% found this document useful (0 votes)
14 views33 pages

Module 5

Uploaded by

veena.s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views33 pages

Module 5

Uploaded by

veena.s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Hashing

• Hashing is a technique that is used to uniquely identify a specific


object from a group of similar objects.
• Hashing is one of the searching techniques that uses a constant time.
The time complexity in hashing is O(1)
• In hashing, large keys are converted into small keys by using hash
functions. The values are then stored in a data structure called hash
table.
• The idea of hashing is to distribute entries (key/value pairs) uniformly
across an array. Each element is assigned a key (converted key).
• By using that key you can access the element in O(1) time.
• Using the key, the algorithm (hash function) computes an index that
suggests where an entry can be found or inserted.
• a hash table is a generalized idea of an array where key
does not have to be an integer. We can have a name as
a key, or for that matter any object as the key. The trick
is to find a hash function to compute an index so that
an object can be stored at a specific location in a table
such that it can easily be found.
HOW HASHING IS DONE
1.An KEY is converted into an integer by using a hash function. This
INTEGER can be used as an index to store the original KEY, which falls
into the hash table.
2. The KEY is stored in the hash table where it can be quickly retrieved
using hashed key
Types in hashing
1) Static Hashing
2) Dynamic Hashing
Static Hashing
• In static hashing, the dictionary pairs are stored in a fixed size table ht
called a hash table.
• The hash table ht is stored in sequential memory locations that are
partitioned into b buckets, ht[0],... , ht [b - 1]. Each bucket is capable
of holding s dictionary pairs.
Hash Function
• Hash function is a function which is applied on a key by which it
produces an integer, which can be used as an address of hash table.
• Hence one can use the same hash function for accessing the data
from the hash table.
• In this the integer returned by the hash function is called hash key.
Division Method
• It is the most simple method of hashing keys (k).
• It assumes keys are non negative integers
• This method divides k by M and then uses the remainder obtained.
h(x) = x mod M
Where M is generally table size
• The division method is quite good for just about any value of M and since it
requires only a single division operation, the method works very fast.
• extra care should be taken to select a suitable value for M.
• it is best to choose M to be a prime number because making M a prime number
increases the likelihood that the keys are mapped with a uniformity in the output
range of values.
Problem-1:
Keys =36,18,72,43,6
Table size or M= 8
Problem 2:
Keys =36,18,72,43,6,10,5,15
Table Size/M=8
• A potential drawback of the division method is that while using this
method, consecutive keys map to consecutive hash values.(collision)
mid-square method
• The mid-square method is a good hash function which works
in two steps:
Step 1: Square the value of the key. That is, find k 2.
Step 2: Extract the middle r digits of the result obtained in Step
1
In the mid-square method, the same r digits must be chosen
from all the keys. Therefore, the hash function can be given as:

• h(k) = s where s is obtained by selecting r digits from k2.


• Example Calculate the hash value for keys 1234 and 5642
using the mid-square method. The hash table has 100
memory locations
•This means that only two digits are needed to map the key to
a location in the hash table, so r= 2.
•When k = 1234, k2 = 1522756, h (1234) = 27
•When k = 5642, k2 = 31832164, h (5642) = 21
•Observe that the 3rd and 4th digits starting from the right are
chosen.
Folding Method
•The folding method works in the following two steps:
Step 1: Divide the key value into a number of parts. That is,
divide k into parts k1, k2, ..., kn, where each part has the
same number of digits except the last part which may have
lesser digits than the other parts.
Step 2: Add the individual parts. That is, obtain the sum of
k1 + k2 + ... + kn. The hash value is produced by ignoring the
last carry, if any. Note that the number of digits in each part
of the key will vary depending upon the size of the hash
table.
Ex:Given a hash table of 100 locations, calculate the hash
value using folding method for keys 5678, 321, and 34567

key 5678 321 34567

56 and 78 32 and 1 34, 56 and 7


Parts

Sum 134 33 97

Hash value 34 (ignore the last carry) 33


97
Digit Analysis
• is used with static files.
• A static file is one in which all the identifiers are known in advance.
• Using this method, we first transform the identifiers into numbers using
some radix, r.
• We then examine the digits of each identifier, deleting those digits that
have the most skewed distributions.
• We continue deleting digits until the number of remaining digits is small
enough to give an address in the range of the hash table.
• The digits used to calculate the hash address must be the same for all
identifiers
Collision Resolution Strategies
• Collisions occur when an element to be inserted hashes
to the same value as an already inserted element.
• Collision is resolved by
1.Open Addressing/Closed Hashing
2.Chaining/open hashing
Open Addressing/Closed Hashing
• open addressing or closed hashing computes new positions using a probe
sequence and the next record is stored in that position
• The process of examining memory locations in the hash table is called probing.
• Open addressing technique can be implemented using
1. linear probing
2. quadratic probing
3. double hashing.
Linear Probing
• When using a linear probe to resolve collision, the key will be stored in the next
available slot in the table, assuming that the table is not already full.
• This is implemented via a linear search for an empty slot, from the point of
collision.
• If the physical end of table is reached during the linear search, the search will
wrap around to the beginning of the table and continue from there.
• If an empty slot is not found before reaching the point of collision, the table is
full.

• If h is the point of collision, probe through h+1,h+2,h+3. h+i. till an empty


slot is found
h(k)=k%table_size
Probe Sequence ::(h(k)+i)%Table size
Challenges in Linear Probing :
• Primary Clustering: One of the problems with linear
probing is Primary clustering, many consecutive
elements form groups and it starts taking time to find a
free slot or to search for an element.
• M=10
• Keys= 70,93,82,41,50,60
Quadratic Probing
• A variation of the linear probing
• Instead of using a constant “skip” value, if the first hash
value that has resulted in collision is h, the successive values
which are probed are h+1, h+4, h+9, h+16, and so on.
• In other words, quadratic probing uses a skip consisting of
successive perfect squares.
Probe sequence :h,h+12,h=22,h=32 h+i2
H(k)=(h+i2)%Tablesize
chaining
• When a collision occurs, elements with the same hash key will
be chained together. A chain is simply a linked list of all the elements
with the same hash key.
• instead of storing the data directly inside the structure, have a linked
list structure at each hash element.
• The hash table slots will no longer hold a table element. They will now
hold the address of a table element.
• Suppose we have a list of key values
• A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
Drawbacks of static hashing

Table size is fixed and hence cannot accommodate


data growth.
Collisions increases as data size grows.
• Avoid the above conditions by doubling the hash table size.
• This increase in hash table size is taken up, when the number of
collisions increase beyond a certain threshold. The threshold limit is
decided by the load factor.
Load factor

• The load factor α of a hash table with n key elements is given by


α= n / hash table size

• The probability of a collision increases as the load factor increases.


We cannot just double the size of the table and copy the elements
from the original table to the new table ,since when the table size is
doubled from N to 2N+1, the hash function changes. It requires
reinserting each element of the old table into the new table (using the
modified hash function).This is called Rehashing.
Dynamic Hashing
• To ensure good performance, it is necessary to increase the size of a
hash table whenever its loading density exceeds a prescribed
threshold
• two forms of Dynamic hashing- one uses a directory and the other
does not.
1)Dynamic Hashing Using Directories
2)Directory Less Dynamic Hashing
Dynamic Hashing Using
Directories
• Dynamic Hashing
• The dynamic hashing method is used to overcome the problems of
static hashing like bucket overflow.
• In this method, data or decreases. This method is also known as
Extendable hashing method.
• This method makes hashing dynamic, i.e., it allows insertion or
deletion without resulting in poor performance. H

You might also like