#1 Hashing
#1 Hashing
8.i
Use digit extraction method(1st ,3rd,and 5th )for hashing the following values
224562,140145,14467,137456,214576,199645,214562,162145,234534 in an array
of 19 elements. Use linear probe method to resolve any collisions.
8.ii
Use Fold shift method for hashing the following values 224562, 140145, 14467,
137456, 214576, 199645, 214562, 162145, 234534 in an array of 19 elements.
Use linear probe method to resolve any collisions.
8.iii
Use Fold boundary method for hashing the following values
224562,140145,14467,137456,214576,199645,214562,162145,234534 in an array
of 19 elements. Use linear probe method to resolve any collisions.
8.iv
Use direct hashing method to insert the keys 99, 33, 23, 44, 56, 43, 19 in an array
of 10 elements. Use linear probe method to resolve any collisions.
Note: In this question the array size is mentioned as 10. But according to
the direct hashing (hashkey is the element itself). So, these given numbers
cannot be accommodated in the array size of 10, so the array size is
changed to 100 and is solved using the same.
8.v
Use subtraction hashing method to insert the keys 99, 33, 23, 44, 56, 43, 19 in an
array of 10 elements. Use linear probe method to resolve any collisions.
Note: in this question the array size is mentioned as 10. But according to
the subtraction technique (list size-key) cannot be accommodated in the
array size of 10, so it is to be solved using modulo division instead of
subtraction.
8.vi
Use modulo division hashing method to insert the keys 55, 65, 20, 12, 66, 26, 90 in
an array of 13 elements. Use linear probe method to resolve any collisions.
Solution
class hash
{
private:
public:
hash();
hash(int);
};
hash :: hash()
{
int i;
size=SIZE;
psize=PHYSIZE;
hashtable=new long int[PHYSIZE];
for(i=0;i<PHYSIZE;i++)
{
hashtable[i]=0;
}
collisions=0;
filled=0;
size=s;
psize=s+1;
collisions=0;
filled=0;
for(i=0;i<psize;i++)
{
hashtable[i]=0;
}
}
void hash :: printdensity()
{
cout<<"\nNo.of Collisions:"<<collisions<<endl;
cout<<"\nFilled:"<<filled<<endl;
cout<<"\nDensity:"<<((filled*100)/size)<<" %";
}
void hash :: printtable()
{
for(int i=0;i<psize;i++)
{
cout<<"\n"<<i<<"."<<hashtable[i];
}
}
void hash :: store(long int num,int hashkey)
{
while(hashtable[hashkey]!=0 && hashkey<=size)
{
hashkey++;
collisions=collisions+1;
}
filled++;
if(hashkey>size)
{
hashkey=0; //wrap around
}
//hashing functions
void hash :: modulodiv(long int n)
{
int key;
key =(n%size)+1;
store(n,key);
}
while(temp!=0)
{
rem=temp%100;
div=div+rem;
temp=temp/100;
key=(div%size)+1;
store(n,key);
}
ltoa(num,no,10);
strncat(r1,no+1,1);
strncat(r1,no,1);
strncat(r2,no+2,2);
strncat(r3,no+5,1);
key=(addn%size)+1;
store(num,key);
}
void hash::digitextraction(long int n)
{
long int t=n;
int p1=t%100;
t=t/100;
int p2=t%100;
t=t/100;
int p3=t;
p1=p1/10;
p2=p2/10;
p3=p3/10;
int tot=(p3*100)+(p2*10)+p1;
store(n,nkey);
}
int main()
{
clrscr();
int choice=0;
int n,i,s;
while(choice<6)
{
cout<<"\n Choose a Hashing Technique:";
cout<<"\n 1.Modulo Division"
<<"\n 2.Fold shift"
<<"\n 3.Fold boundary"
<<"\n 4.Direct hashing"
<<"\n 5.Digit extraction"
<<"\n 6.Exit"
<<"\n Enter your option:";
cin>>choice;
h1.printdensity();
h1.printtable();
hash h2(10);
cout<<"\nfor practical 8.v";
long int userkeys[]={99, 33, 23, 44, 56, 43, 19 };
for(int x=0;x<7;x++)
{
h2.modulodiv(userkeys[x]);
}
h2.printdensity();
h2.printtable();
break;
}
case 2:
{
hash h1(19);
long int userkeys[]={224562, 140145, 14467, 137456,
14576, 199645, 214562, 162145, 234534 };
for(int x=0;x<9;x++)
{
h1.foldshift(userkeys[x]);
}
h1.printdensity();
h1.printtable();
break;
}
case 3:
{
hash h1(19);
long int userkeys[]={224562, 140145, 14467, 137456,
214576, 199645, 214562, 162145, 234534 };
hash h1(100);
long int userkeys[]={99, 33, 23, 44, 56, 43, 19};
for(int x=0;x<7;x++)
{
h1.direct(userkeys[x]);
}
h1.printdensity();
h1.printtable();
break;
}
case 5:
{
hash h1(19);
long int userkeys[]={224562, 140145, 14467, 137456,
214576, 199645, 214562, 162145, 234534 };
for(int x=0;x<n;x++)
{
h1.digitextraction(userkeys[x]);
}
h1.printdensity();
h1.printtable();
break;
}
case 6:
{
break;
}
getch();
return 0;
}
Ans.: Hashing is the process of mapping large amount of data item to a smaller
table with the help of hashing function. The implementation of hash table is called
hashing.
In a hashed search, the key, through an algorithmic function, determines the
location of the data. Because we are searching an array, we use a hashing
algorithm to transform the key into the index that contains the data we need to
locate. Another way to describe hashing is a key-to-address transformation in which
the keys map to addresses in a list.
Hashing Algorithim
1 set looper to 0
2 set addr to 0
Hash key
3 for each character in key
1 if ( character not space )
1 add character to address
2 rotate addr 12 bits right
2 end if
4 end loop
5 if ( addr < 0 )
1 addr = absolute ( addr )
6 end if
7 addr = addr modulo maxAddr
end hash
If the actual data that we insert into our list contain two or more synonyms, we can
have collisions.
A Collision occurs when a hashing algorithm produces an address for an insertion
key and that address is already occupied. The address produced by the hashing
algorithm is known as the home address The memory that contains all of the home
address is known as the prime area. When two keys collide at a home address, we
must resolve the collision by placing one of the keys and its data in another
location.
Ans.: When we need to locate an element in a hashed list, we must use the same
algorithm that we used to insert it into the list. Consequently, we first hash the key
and check the home address to determine whether it contains the desired element.
If it does, the search is complete. If not, we must use the collision resolution
algorithm to determine the next location and continue until we find the element or
determine that it is not in the list. Each calculation of an address and test for
success is known as a probe.
As a rule of thumb, a hashed list should not be allowed to become more than 75%
full.
Load factor
The Load factor of a hashed list is the number of elements in the list
divided by the number of physical elements allocated for the list, expressed as a
percentage. Traditionally, Load factor is assigned the symbol alpha (α). The formula
in which k represents the number of filled elements in the list and n represents the
total number of elements allocated to the list is:
Key Offset:
Key offset is a double hashing method that produces different collisions paths for
different keys. Key offset calculates the new address as a function of the old
address and the key. One of the simplest versions simply adds the quotient of the
key divided by the list size to the address to determine the next collision resolution
address, as shown in the formula below:
Offset = [ key / listsize ]
address = ( ( offset + old address ) modulo listsize )
Example:
When the key is 166702 and the list size is 307, using the modulo-division hashing
method generates an address of 1.
e. Clustering.
Ans.: As data are added to a list and collisions are resolved, some hashing
algorithims tend to cause data to group within the list. This tendency of data to
build up unevenly across a hashed list is known as clustering. Computer scientists
have identified two distinct types of cluster. The first, primary clustering, occurs
when data cluster around a home address. Primary clustering is easy to identify.
In Figure (a), we see that K, P, and Y cluster around K’s home address. In this
example the collision resolution is based on the home address. Q, on the other
hand, hashes to its own address remotely located from K’s home address.
Secondary clustering occurs when data become grouped along a collision path
throughout a list. This type of clustering is not easy to identify. In secondary
clustering the data are widely distributed across the whole list, so the list appears
to be well distributed. If the data all lie along a well- traveled collision path,
However, the time to locate a requested element of data can increase.
Ans.: When we hash a new key to an address, we may create a collision. There
are several methods for handling collisions, each of them independent of the
hashing algorithm. That is each hashing method can be used with each of the
collision resolution methods:
Open addressing:
The first collision resolution method, open addressing, resolves collisions in the
prime area- That is, the area that contains all of the home address. This technique
is opposed to linked list resolution, in which the collisions are resolved placing the
data in a separate overflow area. We discuss four different methods: Linear probe,
quadratic probe, double-hashing and key offset.
In Linear probe, which is the simplest, when data cannot be stored in the home
address we resolve the collision by adding 1 to the current address. As an
alternative to a simple linear probe, we can add 1, subtract 2, add 3, Subtract 4,
and so forth until we locate an empty element.
For example,given a collision at location 341, we would try 342, 340, 343, 339,
and so forth until we located an empty.
Linear probes have two advantages.
First, they are quite simple to implement. Second, data tend to remain near there
home address.
In the Quadratic probe, the increment is the collision probe number squared. Thus
for the first probe we add 12 , for the second collision probe we add 22 , for the
third collision probe we add 32 , and so forth until we either find an empty element
or we exhaust the possible elements.
A potential disadvantage of the quadratic probe is the time required to square the
Probe number. The quadratic probe has one limitaion: it is not possible to generate
a new address For every element in the list.
Key Offset:
Key offset is a double hashing method that produces different collisions paths for
different keys. Key offset calculates the new address as a function of the old
address and the key. One of the simplest versions simply adds the quotient of the
key divided by the list size to the address to determine the next collision resolution
address, as shown in the formula below:
Offset = [ key / listsize ]
address = ( ( offset + old address ) modulo listsize )
Example:
When the key is 166702 and the list size is 307, using the modulo-division hashing
method generates an address of 1.
Bucket hashing
a) May 2012 Q2 a)
What is hashing? Explain the terms synonym, collision and home address.
(Refer to 8.vii above)
Using digit extraction (1st , 3rd and 5th ) method for hashing and linear probing
method for collision resolution; store the keys given below in an array of 19
elements. How many collisions occurred? Determine the density of the list.
224562, 137456, 214562, 140145, 214576, 162145.
Sol :
Collisions : 1
Density : 6 / 19 ~ 32 %
What is hashing?
(Refer to 8.vii above)
Using the digit-extraction method (1, 3, 5) & linear probing, store the keys shown
below in an array with 19 elements. How many collisions occurred? What is the
density of the list after the keys have been inserted?
25668, 14512, 24578, 22501, 27867, 13651, 15934, 21890, 71211.
Sol :
Collisions : 2
Density : 9 / 19 ~ 47 %
Using mid-square method and key offset, store the keys shown below in array of
size 13: 55, 65, 20, 12, 66, 26, 90.
Sol :
Hashing: Mid-Square
- 55
552 = 3025 -> 02 % 13 + 1 = 3 1--
- 65
652 = 4225 -> 22 % 13 + 1 = 10 2 - - 20
- 20
202 = 0400 -> 40 % 13 + 1 = 2 3 - - 55
- 12
122 = 0144 -> 14 % 13 + 1 = 2 (1) 4 - - 12 (1)
[12/13]=0 + 2 = 2 % 13 + 1 =3 5--
[12/13]=0 + 3 = 3 % 13 + 1 =4 6 - - 26 (3)
- 66 662 = 4356 -> 35 % 13 + 1 = 10 (2) 7--
[66/13]=5 +10 = 15 % 13 +1 =3 8--
[66/13]=5 + 3 = 8 % 13 + 1 = 9 9 - - 66 (2)
- 26 262 = 0676 = 67 % 13 + 1 = 3 (3) 10 - - 65
[26/13] = 2 + 3 = 5 % 13 +1 = 6 11 - -90
- 90 902 = 8100 -> 10 % 13 + 1 = 11 12 - -
13 - -
Collisions : 5
Density : 7 / 13 ~ 53 %
What is hashing? Define the terms collision, probe and the load factor.
(Refer to 8.vii above)
Sol :
Hashing: Division(Modulo-division)
- 99 99 % 10 + 1 = 10 1 - - 43 (2)
- 33 33 % 10 + 1 = 4 2 - - 19 (3)
- 23 23 % 10 + 1 = 4 (1) 3--
4 + 12 = 5 % 10+1= 6 4 - - 33
- 44 44 % 10 + 1 = 5 5 - - 44
- 56 56 % 10 + 1 = 7 6 - - 23 (1)
- 43 43 % 10 + 1 = 4 (2) 7 - - 56
4 + 12 = 5 % 10 +1= 6 8--
6 + 22 = 10 % 10 +1=1 9--
- 19 19 % 10 + 1 = 10 (3) 10 - - 99
10 + 12 = 11 %10 +1=2
Collisions : 4
Density : 7 / 10 ~ 70 %
Using Modulo-Division method and linear probing, store the keys shown below in
the array with 19 elements. How many collisions occurred? What is the density of
the list after all keys have been inserted?
224562, 137456, 214562, 140145, 214576, 162145, 144467, 199645, 234534.
Sol :
-224562 % 19 + 1 = 2 1-
-137456 % 19 + 1 = 11 2 - 224562
-214562 % 19 + 1 = 15 3- 140145 (1)
-140145 % 19 + 1 = 2 -> 3 (1) 4-
-214576 % 19 + 1 = 10 5-
-162145 % 19 + 1 = 19 6-
-144467 % 19 + 1 = 11 -> 12 (2) 7-
-199645 % 19 + 1 = 13 8-
-234534 % 19 + 1 = 18 9-
10 - 214576
11 - 137456
12 - 144467 (2)
13 - 199645
14 -
15 - 214562
16 -
17 -
18 - 234534
19 - 162145
Collisions : 2
Density : 9 / 19 ~ 47%
Using midsquare hashing method store the keys given below in an array of size
1000.
Because of the key length square only the first three digits of the key. Use
pseudorandom number generator for rehashing is collisions occur (Take a = 3 and c
= -1 as factors).
(Note: Since the array size mentioned is 1000 which cannot be represented. So the
size assumed is 19)
Sol :
Collisions : = 5
Density : = 9 / 19 ~ 47 %
Using the rotation method for hashing and linear probe method for resolving
collision store the keys shown below in an array with 19 elements. First rotate the
rightmost 2 digits to the left and then use the digit extraction (1st , 3rd and 5th
digits)254761, 167953, 244664, 160849, 254872, 182741, 174767, 129449,
234534. How many collisions occurred? What is the density of the list after all keys
have been inserted?
Sol :
Collisions : = 3
Density : = 9 / 19 ~ 47 %
Using the digit-extraction method (first, third and fifth digits) & quadratic probing,
store the keys shown below in an array with 19 elements. How many collisions
occurred? What is the density of the list after all keys have been inserted? 224562,
137456, 214562, 140145, 214576, 162145, 144467, 199645, 234534.
Sol :
Collisions :2
Density :9 / 19 ~ 47%