[go: up one dir, main page]

0% found this document useful (0 votes)
2 views4 pages

Dshash

Hashing is a data structure technique that allows efficient storage and retrieval of data using a hash function to map keys to indices in a hash table, enabling average O(1) time complexity for search, insert, and delete operations. Key components include the key, hash function, and hash table, while hash collisions occur when multiple keys map to the same index, resolved through techniques like direct chaining and open addressing. Applications of hashing include cryptography, password verification, and various data structures in programming languages.

Uploaded by

bprajna64
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

Dshash

Hashing is a data structure technique that allows efficient storage and retrieval of data using a hash function to map keys to indices in a hash table, enabling average O(1) time complexity for search, insert, and delete operations. Key components include the key, hash function, and hash table, while hash collisions occur when multiple keys map to the same index, resolved through techniques like direct chaining and open addressing. Applications of hashing include cryptography, password verification, and various data structures in programming languages.

Uploaded by

bprajna64
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

INTRODUCTION TO HASHING

Syllabus: Hashing - Perfect hashing functions. Hash table, Hash Functions, Operations, Hash collision,
Application.
Hashing
 Hashing is a technique used in data structures that efficiently stores and retrieves data in a way that
allows for quick access.
 Hashing involves mapping data to a specific index in a hash table (an array of items) using a hash
function. It enables fast retrieval of information based on its key.
 The great thing about hashing is, we can achieve all three operations (search, insert and delete) in
O(1) time on average.
 Hashing is mainly used to implement a set of distinct items (only keys) and dictionaries (key value
pairs).

Components of Hashing
There are majorly three components of hashing:
 Key: A Key can be anything string or integer which is fed as input in the hash function the
technique that determines an index or location for storage of an item in a data structure.
 Hash Function: Receives the input key and returns the index of an element in an array called a hash
table. The index is known as the hash index.
 Hash Table: Hash table is typically an array of lists. It stores values corresponding to the keys. Hash
stores the data in an associative manner in an array where each data value has its own unique index.

Hash Table
A Hash table is defined as a data structure used to insert, look up, and remove key-value pairs quickly. It
operates on the hashing concept, where each key is translated by a hash function into a distinct index in an
array. The index functions as a storage location for the matching value. In simple words, it maps the keys
with the value.
Hash Function
 Hash functions are a fundamental concept in computer science and play a crucial role in various
applications such as data storage, retrieval, and cryptography.
 A hash function creates a mapping from an input key to an index in hash table.
 Properties,
a. Deterministic: A hash function must consistently produce the same output for the same input.

Subject: SEPP Notes/20CS43P pg. 1


b. Fixed Output Size: The output of a hash function should have a fixed size, regardless of the size
of the input.
c. Efficiency: The hash function should be able to process input quickly.
d. Uniformity: The hash function should distribute the hash values uniformly across the output
space to avoid clustering.
e. Pre-image Resistance: It should be computationally infeasible to reverse the hash function, i.e.,
to find the original input given a hash value.
f. Collision Resistance: It should be difficult to find two different inputs that produce the same
hash value.
g. Avalanche Effect: A small change in the input should produce a significantly different hash
value.
Hash Collision
Hash functions arc used to map each key to a different address space, but practically it is not possible to
create such a hash function and the problem is called Collision.
Collision is the condition where two records are stored in the same location.
Collision Resolution Techniques
The process of finding an alternate location is called collision resolution. Even though hash tables have
collision problems, they are more efficient in many cases compared to all other data structures, like search
trees.
There are a number of collision resolution techniques, and the most popular are direct chaining and open
addressing.
1) Direct Chaining: An array of linked list application
o Separate chaining
2) Open Addressing: Array-based implementation
o Linear probing (linear search)
o Quadratic probing (nonlinear search)
o Double hashing (use two hash functions)
1. Direct Chaining
Separate Chaining
Collision resolution by chaining combines linked representation with hash table.
When two or more records hash to the same location, these records are constituted into a singly-linked list
called a chain.
Open Addressing
In open addressing all keys are stored in the hash table itself. This approach is also known as closed
hashing. This procedure is based on probing. A collision is resolved by probing.
Linear Probing
The interval between probes is fixed at 1. In linear probing, we search the hash table sequentially. starting
from the original hash location. If a location is occupied, we check the next location. We wrap around from
the last table location to the first table location if necessary. The function for rehashing is the following:
rehash(key) = (n + 1) % table size
One of the problems with line r probing is that table items tend to cluster together in the hash table. This
means that the table contains groups of consecutively occupied locations that are called clustering.
Clusters can get close to one another, and merge into a larger cluster. Thus, the one part of the table might
be quite dense, even though another part has relatively few items. Clustering causes long probe searches

Subject: SEPP Notes/20CS43P pg. 2


and therefore decreases the overall efficiency. The next location to be probed is determined by step-size,
where other step-sizes (more than one) arc possible. The step-size should be relatively prime to the table
size, i.e. their greatest common divisor should be equal to 1. If we choose the table size to be a prime
number, then any step-size is relatively prime to the table size. Clustering cannot be avoided by larger step-
sizes.
Quadratic Probing
The interval between probes increases proportionally to the hash value (the interval thus increasing
linearly, and the indices are described by a quadratic function). The problem of Clustering can be
eliminated if we use the quadratic probing method. In quadratic probing, we start from the original hash
location i. If a location is occupied, we check the locations i+12, i+22, i+32 , i+42 ..... We wrap around
from the last table location to the first table location if necessary. The function for rehashing is
the following: rehash(key) = (n + k2) % table size
Comparisons between Open Addressing methods

Applications
Hashing provides constant time search, insert and delete operations on average. This is why hashing is one
of the most used data structure, example problems are, distinct elements, counting frequencies of items,
finding duplicates, etc.
There are many other applications of hashing, including modern day cryptography hash functions. Some of
these applications are listed below:
 Message Digest : This is an application of cryptographic Hash Functions
 Password Verification : Cryptographic hash functions are very commonly used in password
verification.
 Data Structures(Programming Languages) : Various programming languages have hash table-based
Data Structures. The basic idea is to create a key-value pair where key is supposed to be a unique
value, whereas value can be same for different keys.
 Compiler Operation : To differentiate between the keywords of a programming language(if, else,
for, return etc.) and other identifiers and to successfully compile the program, the compiler stores
all these keywords in a set which is implemented using a hash table.
 Rabin-Karp Algorithm : This is basically a string-searching algorithm which uses hashing to find
any one set of patterns in a string.

Subject: SEPP Notes/20CS43P pg. 3


 Linking File name and path together : In order to store the correspondence between file_name and
file_path the system uses a map(file_name, file_path)which is implemented using a hash table.
 Game Boards: In a game like Tic-Tac-Toe or chess the position of the game may be stored using
hash table.

QUESTIONS
1. Define hashing, hash table, hash collision, hash function
2. List the properties of hash function.
3. Explain components of hashing.
4. List & explain applications of hashing.
5. Code
6. Compare open addressing methods.

Subject: SEPP Notes/20CS43P pg. 4

You might also like