[go: up one dir, main page]

0% found this document useful (0 votes)
13 views29 pages

Week 3

Uploaded by

hiraazhar2030
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views29 pages

Week 3

Uploaded by

hiraazhar2030
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Data Structure & Algorithms

Hashed List
Searches
Outline

• What is hashing
• What are hashed list searches
• Hashing methods
• Direct
• Subtraction
• Modulo division
• Digit extraction
• Mid square
• Folding
• Rotation
• Pseudorandom
• Collision Resolution
• Collision Resolution Methods
Hashed List Searches

• List searches require a lot of tests before the data is found

• Ideal case: We know where our data is and we may access it directly

• Goal of Hashed Search: Find data in one test


What is Hashing?

• Generating address from a key


• Key is a record’s property on which a record is stored in memory (called
prime area)
• A hash search is a search in which the key is modified (using an
algorithm) into a memory address (called home address)
• The conversion process of converting a key to a memory address is
called Hashing
Hashing Terms

• Usually there are more keys than can be accommodated in memory


• More than one keys that hash to the same location are called synonyms
• A hash collision occurs when an key is inserted into an occupied memory
location
• Calculation of a memory address and its test for collision is called a probe
Concept of Hashing
Hash Collision
Hashing Methods
Direct Method

• The key is the address without any algorithmic modification (it starts from 0 or
1 and increments by 1)
• Example:
class Sales {
public:
Sales(const int day=0,
const int amount=0):day(day),amount(amount){}
private:
int day, amount;
};
Sale sale(0,100);
Sale dailySales[10];
dailySales[sale.day] += sale.amount;
Subtraction Method

• If keys do not start from 0 or 1, you get an index by subtracting a constant


value from the key

class Employee {
public:
Employee():id(count++) {}
const int getID() const { return id; }
void setName(const std::string& n) { name = n; }

private:
int id;
string name;
static int count;
};

int Employee::count = 1000;


Subtraction Example

Employee employees[100];
Employee e;
employees[e.getID()-1000].setName(“Yasir”);
Modulo Division Method

• Also known as division remainder or modulo-division method

• Divides the key by list/array size and uses the remainder as index

• Works with any size but prime numbers give less collisions
Modulo Division Example
Modulo Division Example

• Assume array size of 307


• Lets consider the following keys 121267,045128 and 379452

Input Hash Calculation Index/Address

121267 121267 % 307 2

045128 045128 % 307 306

379452 379452 % 307 0


Digit Extraction Method

• Extract certain digits and then use the obtained number as address
• Assuming that we have an array size of 1000 (indices from 0 to 999) we
can use three digits
• Example: (extracting 1st , 3rd and 4th digit)
• 145674 => 156
• 344565 => 345
• 132554 => 125
Midsquare Method

• The key or a portion of it is squared and the address is selected from the
mid of the squared number
• Example:
• Assuming we have a three digit key and three digit address

Input Square Index/Address

121 14641 464

365 133225 322

749 561001 100


Folding Method
• Two methods
• Fold shift
• Fold boundary
• Fold shift
• Key is divided into parts whose size matches the required
address size
• The left and right parts are shifted and added to the middle part
• Fold boundary
• The key is divided into parts as in fold shift
• The digits of left and right parts are reversed and then added to
the middle part
• In both folding methods, overflowing digits are discarded
Folding Example
Rotation Method

• Generally used in conjunction with other methods


• Useful when keys are assigned serially as in employee ids and part
numbers etc.
• Such numbers usually end up with identical numbers that differ by 1 digit
only
• When these numbers are used in hashing, they tend to hash to the same
address
• To solve this, the last digits are rotated to the front of the key
• Usually performed by bitwise shift (<<, >>) and bitwise and (&) operator
Rotation Example
Collision Resolution

• Apart from direct and subtraction methods, all other hashing methods
suffer from collisions
• Some hashed list terms:
𝑘
• Load factor (): Percentage of the number of elements in list divided by the total list
size ∝= ∗ 100
𝑛

• Clustering: Clustering is a build up of data unevenly across the hashed list due to
collision resolution.
• Primary Clustering: When clustering is at home address
• Secondary Clustering: When clustering is at collision path
Collision Resolution Methods
Open Addressing

• Resolves collision in the prime area (where all home address are stored)
• When collision occurs, prime area is searched for empty cells
• Basic two types
• Linear probe
• Quadratic probe
Linear Probe

• In case of collision, add 1 to the current address/index


• If the new address is occupied add 1 again to find an empty address
• An alternate scheme adds 1, if collision is found, it subtracts 2, if another
collision is found, it adds 3 and so on
• Generated address must be within the address range
• Advantages
• Simple to implement
• Data remains near home address
• Disadvantages
• Create primary clusters
• Make search algorithm complex due to addition and deletion
Quadratic Probe

• Instead of adding 1, we add the collision probe number squared


• For 1 collision probe, add 12=1, for 2 collision probe, add 22=4, for 3 collision
probe, add 32=9 and so on
• Continued until we find an empty address or we don’t have enough elements
• Advantages
• Easy to implement
• Avoids primary and secondary clustering
• Disadvantages
• More time required for squaring the probe number which can be avoided by just
doing a scalar multiplication
• It is not possible to generate a new address for every element
Linked List Resolution

• General disadvantage of open addressing is that each collision resolution


increases chances of future collision
• In linked list resolution, a separate area (called overflow area) is used for
collisions
• All synonyms are chained into a linked list and are then attached to the prime area
using a pointer
• In case of collision, one element is stored in prime area and it is chained to its
corresponding linked list in the overflow area through a linked list pointer
Bucket Hashing (overview)

• Buckets can contain multiple data occurrences


• Advantages
• Can store multiple data items during collision
• Disadvantages
• Takes significantly more space as a lot of buckets remain empty or partially empty
• The memory is fragmented considerably due to partial empty buckets
• It cannot completely resolve collisions

You might also like