We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33
Hashing
• Hashing is a technique that is used to uniquely identify a specific
object from a group of similar objects. • Hashing is one of the searching techniques that uses a constant time. The time complexity in hashing is O(1) • In hashing, large keys are converted into small keys by using hash functions. The values are then stored in a data structure called hash table. • The idea of hashing is to distribute entries (key/value pairs) uniformly across an array. Each element is assigned a key (converted key). • By using that key you can access the element in O(1) time. • Using the key, the algorithm (hash function) computes an index that suggests where an entry can be found or inserted. • a hash table is a generalized idea of an array where key does not have to be an integer. We can have a name as a key, or for that matter any object as the key. The trick is to find a hash function to compute an index so that an object can be stored at a specific location in a table such that it can easily be found. HOW HASHING IS DONE 1.An KEY is converted into an integer by using a hash function. This INTEGER can be used as an index to store the original KEY, which falls into the hash table. 2. The KEY is stored in the hash table where it can be quickly retrieved using hashed key Types in hashing 1) Static Hashing 2) Dynamic Hashing Static Hashing • In static hashing, the dictionary pairs are stored in a fixed size table ht called a hash table. • The hash table ht is stored in sequential memory locations that are partitioned into b buckets, ht[0],... , ht [b - 1]. Each bucket is capable of holding s dictionary pairs. Hash Function • Hash function is a function which is applied on a key by which it produces an integer, which can be used as an address of hash table. • Hence one can use the same hash function for accessing the data from the hash table. • In this the integer returned by the hash function is called hash key. Division Method • It is the most simple method of hashing keys (k). • It assumes keys are non negative integers • This method divides k by M and then uses the remainder obtained. h(x) = x mod M Where M is generally table size • The division method is quite good for just about any value of M and since it requires only a single division operation, the method works very fast. • extra care should be taken to select a suitable value for M. • it is best to choose M to be a prime number because making M a prime number increases the likelihood that the keys are mapped with a uniformity in the output range of values. Problem-1: Keys =36,18,72,43,6 Table size or M= 8 Problem 2: Keys =36,18,72,43,6,10,5,15 Table Size/M=8 • A potential drawback of the division method is that while using this method, consecutive keys map to consecutive hash values.(collision) mid-square method • The mid-square method is a good hash function which works in two steps: Step 1: Square the value of the key. That is, find k 2. Step 2: Extract the middle r digits of the result obtained in Step 1 In the mid-square method, the same r digits must be chosen from all the keys. Therefore, the hash function can be given as:
• h(k) = s where s is obtained by selecting r digits from k2.
• Example Calculate the hash value for keys 1234 and 5642 using the mid-square method. The hash table has 100 memory locations •This means that only two digits are needed to map the key to a location in the hash table, so r= 2. •When k = 1234, k2 = 1522756, h (1234) = 27 •When k = 5642, k2 = 31832164, h (5642) = 21 •Observe that the 3rd and 4th digits starting from the right are chosen. Folding Method •The folding method works in the following two steps: Step 1: Divide the key value into a number of parts. That is, divide k into parts k1, k2, ..., kn, where each part has the same number of digits except the last part which may have lesser digits than the other parts. Step 2: Add the individual parts. That is, obtain the sum of k1 + k2 + ... + kn. The hash value is produced by ignoring the last carry, if any. Note that the number of digits in each part of the key will vary depending upon the size of the hash table. Ex:Given a hash table of 100 locations, calculate the hash value using folding method for keys 5678, 321, and 34567
key 5678 321 34567
56 and 78 32 and 1 34, 56 and 7
Parts
Sum 134 33 97
Hash value 34 (ignore the last carry) 33
97 Digit Analysis • is used with static files. • A static file is one in which all the identifiers are known in advance. • Using this method, we first transform the identifiers into numbers using some radix, r. • We then examine the digits of each identifier, deleting those digits that have the most skewed distributions. • We continue deleting digits until the number of remaining digits is small enough to give an address in the range of the hash table. • The digits used to calculate the hash address must be the same for all identifiers Collision Resolution Strategies • Collisions occur when an element to be inserted hashes to the same value as an already inserted element. • Collision is resolved by 1.Open Addressing/Closed Hashing 2.Chaining/open hashing Open Addressing/Closed Hashing • open addressing or closed hashing computes new positions using a probe sequence and the next record is stored in that position • The process of examining memory locations in the hash table is called probing. • Open addressing technique can be implemented using 1. linear probing 2. quadratic probing 3. double hashing. Linear Probing • When using a linear probe to resolve collision, the key will be stored in the next available slot in the table, assuming that the table is not already full. • This is implemented via a linear search for an empty slot, from the point of collision. • If the physical end of table is reached during the linear search, the search will wrap around to the beginning of the table and continue from there. • If an empty slot is not found before reaching the point of collision, the table is full.
• If h is the point of collision, probe through h+1,h+2,h+3. h+i. till an empty
slot is found h(k)=k%table_size Probe Sequence ::(h(k)+i)%Table size Challenges in Linear Probing : • Primary Clustering: One of the problems with linear probing is Primary clustering, many consecutive elements form groups and it starts taking time to find a free slot or to search for an element. • M=10 • Keys= 70,93,82,41,50,60 Quadratic Probing • A variation of the linear probing • Instead of using a constant “skip” value, if the first hash value that has resulted in collision is h, the successive values which are probed are h+1, h+4, h+9, h+16, and so on. • In other words, quadratic probing uses a skip consisting of successive perfect squares. Probe sequence :h,h+12,h=22,h=32 h+i2 H(k)=(h+i2)%Tablesize chaining • When a collision occurs, elements with the same hash key will be chained together. A chain is simply a linked list of all the elements with the same hash key. • instead of storing the data directly inside the structure, have a linked list structure at each hash element. • The hash table slots will no longer hold a table element. They will now hold the address of a table element. • Suppose we have a list of key values • A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3 Drawbacks of static hashing
Table size is fixed and hence cannot accommodate
data growth. Collisions increases as data size grows. • Avoid the above conditions by doubling the hash table size. • This increase in hash table size is taken up, when the number of collisions increase beyond a certain threshold. The threshold limit is decided by the load factor. Load factor
• The load factor α of a hash table with n key elements is given by
α= n / hash table size
• The probability of a collision increases as the load factor increases.
We cannot just double the size of the table and copy the elements from the original table to the new table ,since when the table size is doubled from N to 2N+1, the hash function changes. It requires reinserting each element of the old table into the new table (using the modified hash function).This is called Rehashing. Dynamic Hashing • To ensure good performance, it is necessary to increase the size of a hash table whenever its loading density exceeds a prescribed threshold • two forms of Dynamic hashing- one uses a directory and the other does not. 1)Dynamic Hashing Using Directories 2)Directory Less Dynamic Hashing Dynamic Hashing Using Directories • Dynamic Hashing • The dynamic hashing method is used to overcome the problems of static hashing like bucket overflow. • In this method, data or decreases. This method is also known as Extendable hashing method. • This method makes hashing dynamic, i.e., it allows insertion or deletion without resulting in poor performance. H