[go: up one dir, main page]

0% found this document useful (0 votes)
21 views8 pages

Hashing

The document discusses hashing, priority queues, and efficient binary search trees. It explains the process of hashing, types of hash functions, collision resolution techniques, and dynamic hashing methods. Additionally, it defines priority queues, their types, and their implementation using heaps.

Uploaded by

findthevalueofxy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

Hashing

The document discusses hashing, priority queues, and efficient binary search trees. It explains the process of hashing, types of hash functions, collision resolution techniques, and dynamic hashing methods. Additionally, it defines priority queues, their types, and their implementation using heaps.

Uploaded by

findthevalueofxy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Module 5

Hashing, Priority Queues & Efficient Binary Search Trees

HASHING: Hashing is the process of transforming any key or a string into another fixed
size value usually of smaller size using a hash function.
HASH TABLE REPRESENTATION
In hashing the dictionary pairs are stored in a table, ht, called the hash table. The hash table is
partitioned into b buckets, ht[0],…,ht[b-1]. The address or location of a pair is determined by
a hash function, h, which maps keys into buckets.
 Hash table is a data structure used for storing and retrieving data very quickly. Insertion
of data in the hash table is based on the key value. Hence every entry in the hash table
is associated with some key.
 Using the hash key the required piece of data can be searched in the hash table by few
or more key comparisons.
HASH FUNCTION
 Hash function is a function which is used to store the data into the hash table. Hence
one can use the same hash function to retrieve the data from the hash table. The integer
returned by the hash function is called hash key.
Static Hashing: Refers to the hashing process where the size of the hash table is fixed.

TYPES OF HASH FUNCTIONS:


There are several types of uniform hash functions:
1. Mid-square:
 The middle of square hash function is frequently used in symbol table applications.
The function is computed by squaring the identifier and then using an appropriate
number of bits from the middle of the square to obtain the bucket address.
 Since the middle bits of the square usually depend upon all the characters in an
identifier, there is a high probability that different identifiers will produce different
hash addresses, even when some of the characters are the same. The size of the hash
table should be a power of 2 when this scheme is used.

2. Division:
 In this method, a simple hash function is obtained by using the modulus (%)
operator.
 In this scheme, we divide the identifier x by some number M and use the
remainder as the hash address for x.
 The hash function is:

f(x)= x % M
 This gives bucket addresses that range from 0 to M - 1, where M = the table
size.
3. Folding:
 In this method, the identifier ‘X’ is partitioned into several parts. All parts,
except for the last one have the same length.
 We then add the parts together to obtain the hash address for X.

4. Digit Analysis:
 The last method we will examine, digit analysis, is used with static files. A
static file is one in which all the identifiers are known in advance.
 We first transform the identifiers into numbers using some radix, r. Then
examine the digits of each identifier. Some digits having most skewed
distributions are deleted.
 This deleting of digits is continued until the number of remaining digits is small
enough to give an address in the range of the hash table. Then these digits are
used to calculate the hash address.
COLLISSION
The situation in which the hash function returns the same hash key (home bucket) for
more than one record is called collision and two same hash keys returned for different records
is called synonym.
COLLISION RESOLUTION TECHNIQUES
If collision occurs then it should be handled by applying some techniques. Such a
technique is called collision resolution technique.
There are two methods for detecting collisions and overflows in a static hash table; each
method using a different data structure to represent the hash table. Linear Probing and
Chaining.
1. Open addressing (linear probing)
2. Chaining
3. Rehashing
1. Linear Probing:
 When linear open addressing is used , the hash table is represented as a one dimensional
array with indices that range from 0 to the desired table size - 1.
 The component type of the array is a struct that contains at least a key field. The C
declarations creating the hash table ht with one slot per bucket are:

 Before inserting any elements into this table, the table must be initialized to represent
the situation where all slots are empty. This allows to detect overflows and collisions.
 If the slot at the hash address is empty, we simply place the new element into this slot.
However, if the new element is hashed into a full bucket, we must find another bucket
for it. The simplest solution places the new element in the closest unfilled bucket.

Problem with linear probing:


 One major problem with linear probing is primary clustering. Primary clustering is a
process in which a block of data is formed in the hash table when collision is
resolved.
 These clusters tend to merge as more identifiers are entered into the table, thus leading
to bigger clusters. We can partially curtail the growth of these clusters and hence reduce
the average number of probes by using quadratic probing.
( Refer Lab Prog 12: for implementation).
2. Chaining:
In collision handling method chaining is a concept which introduces an additional field with
data i.e. chain. A separate chain table is maintained for colliding data. When collision occurs
then a linked list(chain) is maintained at the home bucket.
For eg;
Consider the keys to be placed in their home buckets are 131, 3, 4, 21, 61, 7, 97, 8, 9
Hash function as H(key) = key % D, Where D is the size of table.
3. Rehashing: The rehashing method is to use a series of hash functions h1,h2,…,hm, when the
collision occurs.

DYNAMIC HASHING:
• The dynamic hashing method is used to overcome the problems of static hashing like
bucket overflow.
• In this method, data buckets grow or shrink as the records increases or decreases. This
method is also known as Extendable hashing method.
• One of the most important classes of software is the database management system or
DBMS.
• In a DBMS the user enters a query using some language (possibly SQL) and the system
translates it and retrieves the resulting data. Fast access time is essential since a DBMS
is typically used to hold large sets of information.
• Another key characteristic of a DBMS is that the amount of information can vary a
great deal over time.
• Dynamic hashing retains the fast retrieval time of conventional hashing, while
extending the technique so that it can accommodate dynamically increasing and
decreasing file size without penalty.
• We assume that a file, F, is a collection of records, R. Each record has a key field, K,
by which it is identified. Records are stored in buckets, or pages as they are called in
dynamic hashing, whose capacity is p.
i. Dynamic Hashing Using Directories
 Consider an example where an identifier consists of two characters and each
character is represented by 3 bits.

 The identifiers are to be placed into a table that has four pages. Each page can hold no
more than two identifiers, and the pages are indexed by the 2 bit sequence 00, 01, 10, 11,
respectively.
 Depth refers to the number of bits used as the index.
ii. Directoryless Dynamic Hashing:
 Assuming that there exists a contiguous address space which is large enough
to hold all the records, we can eliminate the directory.
 In effect, this leaves it to the operating system to break the address space
into pages, and to manage moving them into and out of memory. This
scheme is referred to as directoryless hashing or linear hashing.
Priority Queues
Definition:
 A priority queue is a collection of zero or more elements. Each element has a
priority or value.
 Unlike the queues, which are FIFO structures, the order of deleting from a priority
queue is determined by the element priority.
 Elements are removed/deleted either in increasing or decreasing order of priority
rather than in the order in which they arrived in the queue.

There are two types of priority queues:


 Min priority queue: Collection of elements in which the items can be inserted
arbitrarily, but only smallest element can be removed.
 Max priority queue: Collection of elements in which insertion of items can be in any
order but only largest element can be removed.
 In priority queue, the elements are arranged in any order and out of which only the
smallest or largest element allowed to delete each time.
 The implementation of priority queue can be done using arrays or linked list. The data
structure heap is used to implement the priority queue effectively.
Leftist Trees.
Optimal BSTS
Notes on the above topics will be shared soon.

You might also like