[go: up one dir, main page]

0% found this document useful (0 votes)
13 views109 pages

Unit - 4 Sorting

Sorting is the process of arranging data in a specific order, which optimizes data searching and enhances readability. Common sorting algorithms include Bubble Sort, Selection Sort, Insertion Sort, Quick Sort, and Merge Sort, each with varying complexities and applications. The choice of sorting algorithm depends on factors like data size, memory usage, and the specific requirements of the task.

Uploaded by

harinik842
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views109 pages

Unit - 4 Sorting

Sorting is the process of arranging data in a specific order, which optimizes data searching and enhances readability. Common sorting algorithms include Bubble Sort, Selection Sort, Insertion Sort, Quick Sort, and Merge Sort, each with varying complexities and applications. The choice of sorting algorithm depends on factors like data size, memory usage, and the specific requirements of the task.

Uploaded by

harinik842
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 109

Sorting

 Sorting refers to arranging data in a particular format. Sorting


algorithm specifies the way to arrange data in a particular
order. Most common orders are in numerical or lexicographical
order.

 The importance of sorting lies in the fact that data searching


can be optimized to a very high level, if data is stored in a
sorted manner. Sorting is also used to represent data in more
readable formats. Following are some of the examples of
sorting in real-life scenarios −

 Telephone Directory − The telephone directory stores the


telephone numbers of people sorted by their names, so that the
names can be searched easily.

 Dictionary − The dictionary stores words in an alphabetical order


so that searching of any word becomes easy.
Sorting is the process of arranging the
elements of an array so that they can be
placed either in ascending or descending
order. For example, consider an array A =
{A1, A2, A3, A4, ?? An }, the array is called
to be in ascending order if element of A are
arranged like A1 > A2 > A3 > A4 > A5 > ?
> An .
Consider an array;
int A[10] = { 5, 4, 10, 2, 30, 45, 34, 14, 18,
9)
The Array sorted in ascending order
will be given as;
Sorting Category
 In-place Sorting and Not-in-place Sorting
 Sorting algorithms may require some extra space for
comparison and temporary storage of few data
elements. These algorithms do not require any extra
space and sorting is said to happen in-place, or for
example, within the array itself. This is called in-place
sorting.
 Bubble sort is an example of in-place sorting.

 some sorting algorithms, the program requires space


which is more than or equal to the elements being
sorted. Sorting which uses equal or more space is called
not-in-place sorting.
 Merge-sort is an example of not-in-place sorting.
There are two different categories in
sorting:
Internal sorting: If the input data is such
that it can be adjusted in the main memory
at once, it is called internal sorting.
External sorting: If the input data is such
that it cannot be adjusted in the memory
entirely at once, it needs to be stored in a
hard disk, floppy disk, or any other storage
device. This is called external sorting.
Bubble sort
Bubble sort is a simple sorting algorithm.
This sorting algorithm is comparison-based
algorithm in which each pair of adjacent
elements is compared and the elements
are swapped if they are not in order.
This algorithm is not suitable for large data
sets as its average and worst case
complexity are of Ο(n2) where n is the
number of items.
Algorithm
begin BubbleSort(arr)
for all array elements
if arr[i] > arr[i+1]
swap(arr[i], arr[i+1])
end if
end for
return arr
end BubbleSort
Let the elements of array are -

.
Now, compare 32 and 35.
Now, move to the second iteration.
Second Pass
The same process will be followed for
second iteration.
 Third Pass
 The same process will be followed for third
iteration.
Bubble Sort compares the adjacent elements.

Cycle Number of Comparisons

1st (n-1)

2nd (n-2)

3rd (n-3)

....... ......

last 1

the number of comparisons is

(n-1) + (n-2) + (n-3) +.....+ 1 = n(n-


1)/2
Bubble sort complexity
 Best Case Complexity - It occurs when there is no
sorting required, i.e. the array is already sorted. The
best-case time complexity of bubble sort is O(n).
 Average Case Complexity - It occurs when the
array elements are in jumbled order that is not
properly ascending and not properly descending.
The average case time complexity of bubble sort
is O(n2).
 Worst Case Complexity - It occurs when the array
elements are required to be sorted in reverse order.
That means suppose you have to sort the array
elements in Case
ascending Time
order, but its elements are in
Complexity
descending Best
order. The O(n)
Case
worst-case time complexity of
bubble sort is O(n2).
Average O(n )
2

Case
Worst Case O(n2)
Space Complexity

Space Complexity O(1)


Stable YES

The space complexity of bubble sort is O(1). It is


because, in bubble sort, an extra variable is required
for swapping.
Applications of Bubble sort

Bubble sort is a sorting algorithm that is


used to sort the elements in an ascending
order.
It uses less storage space
Bubble sort can be beneficial to sort the
unsorted elements in a specific order.
It can be used to sort the students on basis
of their height in a line.
To create a stack , pile up the elements on
basis of their weight
Selection Sorting
 In selection sort, the smallest value among the unsorted
elements of the array is selected in every pass and inserted to
its appropriate position into the array.
 Selection sort is a simple sorting algorithm. This sorting
algorithm is an in-place comparison-based algorithm in which
the list is divided into two parts, the sorted part at the left end
and the unsorted part at the right end.
 Initially, the sorted part is empty and the unsorted part is the
entire list.

 The smallest element is selected from the unsorted array and


swapped with the leftmost element, and that element becomes a
part of the sorted array. This process continues moving unsorted
array boundary by one element to the right.

 This algorithm is not suitable for large data sets as its average
and worst case complexities are of Ο(n2), where n is the number
of items.
Selection sort is generally used when -
A small array is to be sorted
Swapping cost doesn't matter
It is compulsory to check all elements
 SELECTION SORT(arr, n)

Step 1: Repeat Steps 2 and 3 for i = 0 to n-1


Step 2: CALL SMALLEST(arr, i, n, pos)
Step 3: SWAP arr[i] with arr[pos]
[END OF LOOP]
Step 4: EXIT

SMALLEST (arr, i, n, pos)


Step 1: [INITIALIZE] SET SMALL = arr[i]
Step 2: [INITIALIZE] SET pos = i
Step 3: Repeat for j = i+1 to n
if (SMALL > arr[j])
SET SMALL = arr[j]
SET pos = j
[END OF if]
[END OF LOOP]
Step 4: RETURN pos
After two iterations, two least values are
positioned at the beginning in a sorted
manner.

Selection Sort
The same process is applied to the rest of
the items in the array.
Following is a pictorial depiction of
the entire sorting process −
 Complexity = O(n2)
 we can analyze the complexity by simply observing
the number of loops. There are 2 loops so the
complexity is n*n = n2.
 Time Complexities:
 Worst Case Complexity: O(n2)
If we want to sort in ascending order and the array
is in descending order then, the worst case occurs.
 Best Case Complexity: O(n2)
It occurs when the array is already sorted
 Average Case Complexity: O(n2)
It occurs when the elements of the array are in
jumbled order (neither ascending nor descending).
Case Time Complexity
Best Case O(n2)
Average Case O(n2)
Worst Case O(n2)
Space Complexity
O(1)
Stable
YES
Space complexity is O(1) because an extra
variable temp is used.
Selection Sort Applications

The selection sort is used when


a small list is to be sorted
cost of swapping does not matter
checking of all the elements is compulsory
cost of writing to a memory matters like in
flash memory (number of writes/swaps is
O(n) as compared to O(n2) of bubble sort)
Insertion Sort
 This is an in-place comparison-based sorting
algorithm. Here, a sub-list is maintained which is
always sorted. For example, the lower part of an array
is maintained to be sorted.
 An element which is to be 'insert'ed in this sorted sub-
list, has to find its appropriate place and then it has to
be inserted there. Hence the name, insertion sort.

 The array is searched sequentially and unsorted items


are moved and inserted into the sorted sub-list (in the
same array). This algorithm is not suitable for large
data sets as its average and worst case complexity
are of Ο(n2), where n is the number of items.
Algorithm
Step 1 − If it is the first element, it is
already sorted. return 1;
Step 2 − Pick next element
Step 3 − Compare with all elements in the
sorted sub-list
Step 4 − Shift all the elements in the sorted
sub-list that is greater than the
 value to be sorted
Step 5 − Insert the value
Step 6 − Repeat until list is sorted
 Time Complexity
 Worst Case Complexity: O(n2)
Suppose, an array is in ascending order, and you want to sort it in
descending order. In this case, worst case complexity occurs.

Each element has to be compared with each of the other


elements so, for every nth element, (n-1) number of comparisons
are made.

Thus, the total number of comparisons = n*(n-1) ~ n 2


 Best Case Complexity: O(n)
When the array is already sorted, the outer loop runs for n number
of times whereas the inner loop does not run at all. So, there are
only n number of comparisons. Thus, complexity is linear.
 Average Case Complexity: O(n2)
It occurs when the elements of an array are in jumbled order
(neither ascending nor descending).
Space Complexity
Space complexity is O(1) because an extra
variable key is used.
Insertion Sort Applications
The insertion sort is used when:
 the array is has a small number of elements
 there are only a few elements left to be sorted
Quick Sort
 Quick sort is a fast sorting algorithm used to sort a list of
elements. Quick sort algorithm is invented by C. A. R.
Hoare.
 The quick sort algorithm attempts to separate the list of
elements into two parts and then sort each part
recursively. That means it use divide and
conquer strategy.
 In quick sort, the partition of the list is performed based
on the element called pivot. Here pivot element is one of
the elements in the list.

 The list is divided into two partitions such that "all


elements to the left of pivot are smaller than the
pivot and all elements to the right of pivot are
greater than or equal to the pivot".
Quick sort is also known as Partition-exchange sort based on
the rule of Divide and Conquer.
It is a highly efficient sorting algorithm.
Quick sort is the quickest comparison-based sorting algorithm.
It is very fast and requires less additional space, only O(n log n)
space is required.
Quick sort picks an element as pivot and partitions the array
around the picked pivot.
 Step by Step Process
 In Quick sort algorithm, partitioning of the list is
performed using following steps...
• Step 1 - Consider the first element of the list
as pivot (i.e., Element at first position in the list).
• Step 2 - Define two variables i and j. Set i and j to first
and last elements of the list respectively.
• Step 3 - Increment i until list[i] > pivot then stop.
• Step 4 - Decrement j until list[j] < pivot then stop.
• Step 5 - If i < j then exchange list[i] and list[j].
• Step 6 - Repeat steps 3,4 & 5 until i > j.
• Step 7 - Exchange the pivot element with list[j]
element.
1. First element as pivot

2. Last element as pivot

3. Random element as pivot

4. Median as pivot
Algorithm for Quick Sort

Step 1: Choose the highest index value as pivot.

Step 2: Take two variables to point left and right of the list excluding
pivot.

Step 3: Left points to the low index.

Step 4: Right points to the high index.

Step 5: While value at left < (Less than) pivot move right.

Step 6: While value at right > (Greater than) pivot move left.

Step 7: If both Step 5 and Step 6 does not match, swap left and right.

Step 8: If left = (Less than or Equal to) right, the point where they met
is new pivot.
Implementaion of Quick Sort Algorithm using C
Programming Language
#include<stdio.h> void quickSort(int list[10],int first,int last){
#include<conio.h> int pivot,i,j,temp;
if(first < last){
void quickSort(int [10],int,int); pivot = first;
void main(){ i = first;
int list[20],size,i; j = last;
while(i < j){
printf("Enter size of the list: ");
while(list[i] <= list[pivot] && i < last)
scanf("%d",&size); i++;
printf("Enter %d integer values: ",size); while(list[j] > list[pivot])
j--;
for(i = 0; i < size; i++)
if(i <j){
scanf("%d",&list[i]); temp = list[i];
quickSort(list,0,size-1); list[i] = list[j];
printf("List after sorting is: "); list[j] = temp;
}
for(i = 0; i < size; i++) }
printf(" %d",list[i]); temp = list[pivot];
getch(); list[pivot] = list[j];
list[j] = temp;
}
quickSort(list,first,j-1);
quickSort(list,j+1,last);
}
}
Complexity of the Quick Sort Algorithm

To sort an unsorted list with 'n' number of elements, we need to make


((n-1)+(n-2)+(n-3)+......+1) = (n (n-1))/2 number of comparisions in the
worst case. If the list is already sorted, then it requires 'n' number of
comparisions.

Worst Case : O(n2)


Best Case : O (n log n)
Average Case : O (n log n)
The diagram represents how to find the pivot
value in an array. As we see, pivot value
divides the list into two parts (partitions) and
then each part is processed for quick sort.
Quick sort is a recursive function. We can call
the partition function again.
Application of Quick sort
Quicksort is used everywhere where a
stable sort is not needed
Variants of quicksort are used to separate
the k smallest or largest elements
Quicksort's divide-and-conquer enables the
use of parallelization
Merge Sort in Data Structure
 Merge sort is a sorting algorithm based on the Divide and
conquer strategy. It works by recursively dividing the
array into two equal halves, then sort them and combine
them. It takes a time of (n logn) in the worst case.
 The merge sort algorithm is an implementation of the
divide and conquer technique. Thus, it gets completed in
three steps:
 1. Divide: In this step, the array/list divides itself recursively
into sub-arrays until the base case is reached.
 2. Recursively solve: Here, the sub-arrays are sorted using
recursion.
 3. Combine: This step makes use of the merge( ) function to
combine the sub-arrays into the final sorted array.
 Merge sort is similar to the quick sort algorithm as it uses
the divide and conquer approach to sort the elements.
 It is one of the most popular and efficient sorting
algorithm.
 It divides the given list into two equal halves, calls itself
for the two halves and then merges the two sorted
halves. We have to define the merge() function to
perform the merging.
 The sub-lists are divided again and again into halves until
the list cannot be divided further. Then we combine the
pair of one element lists into two-element lists, sorting
them in the process.
 The sorted two-element pairs is merged into the four-
element lists, and so on until we get the sorted list.
Merge sort keeps on dividing the list into
equal halves until it can no more be
divided. By definition, if it is only one
element in the list, it is sorted. Then, merge
sort combines the smaller sorted lists
keeping the new list sorted too.

Step 1 − if it is only one element in the list


it is already sorted, return.
Step 2 − divide the list recursively into two
halves until it can no more be divided.
Step 3 − merge the smaller lists into new
Algorithm for Merge Sort

Step 1: Find the middle index of the array.


Middle = 1 + (last – first)/2
Step 2: Divide the array from the middle.
Step 3: Call merge sort for the first half of the
array
MergeSort(array, first, middle)
Step 4: Call merge sort for the second half of
the array.
MergeSort(array, middle+1, last)
Step 5: Merge the two sorted halves into a
single sorted array.
void mergeSort(int a[], int beg, int en /* Function to print the array */
d) void printArray(int a[], int n)
{ {
int i;
if (beg < end)
for (i = 0; i < n; i++)
{ printf("%d ", a[i]);
int mid = (beg + end) / 2; printf("\n");
mergeSort(a, beg, mid); }
mergeSort(a, mid + 1, end);
merge(a, beg, mid, end);
}
int main()
}
{
int a[] = { 12, 31, 25, 8, 32, 17, 40, 42 };
int n = sizeof(a) / sizeof(a[0]);
printf("Before sorting array elements are -\
n");
printArray(a, n);
mergeSort(a, 0, n - 1);
printf("After sorting array elements are - \n");
printArray(a, n);
return 0;
}
Output
Below given figure shows how Merge Sort works:
 Time Complexity
Best Case Complexity - It occurs when there is no sorting
required, i.e. the array is already sorted. The best-case time
complexity of merge sort is O(n*logn).
Average Case Complexity - It occurs when the array elements
are in jumbled order that is not properly ascending and not
properly descending. The average case time complexity of merge
sort is O(n*logn).
Worst Case Complexity - It occurs when the array elements are
required to be sorted in reverse order. That means suppose you
have to sort the array elements in ascending order, but its elements
are in descending order. The worst-case time complexity of merge
sort is O(n*logn).
Space Complexity
The space complexity of merge sort is O(n). It is because, in
merge sort, an extra variable is required for swapping .
Applications of Merge sort
Merge Sort is useful for sorting linked lists
in O(n Log n) time

Merge sort can be implemented without


extra space for linked lists

Merge sort is used for counting inversions


in a list

Merge sort is used in external sorting


Heap Sort Algorithm
Heap sort processes the elements by creating the min-heap or max-
heap using the elements of the given array. Min-heap or max-heap
represents the ordering of array in which the root element represents
the minimum or maximum element of the array.
Heap sort basically recursively performs two main operations -
•Build a heap H, using the elements of array.
•Repeatedly delete the root element of the heap formed in 1st phase.
Before knowing more about the heap sort, let's first see a brief
description of Heap.

What is a heap?
A heap is a complete binary tree, and the binary tree is a tree in
which the node can have the utmost two children. A complete
binary tree is a binary tree in which all the levels except the last
level, i.e., leaf node, should be completely filled, and all the nodes
should be left-justified.
What is heap sort?
Heapsort is a popular and efficient sorting algorithm. The concept of heap
sort is to eliminate the elements one by one from the heap part of the list,
and then insert them into the sorted part of the list.
Heapsort is the in-place sorting algorithm.

Heap sort is one of the sorting algorithms used to arrange a list of


elements in order. Heapsort algorithm uses one of the tree concepts
called Heap Tree. In this sorting algorithm, we use Max Heap to
arrange list of elements in Descending order and Min Heap to arrange
list elements in Ascending order.
Step by Step Process
The Heap sort algorithm to arrange a list of elements in ascending order
is performed using following steps...
•Step 1 - Construct a Binary Tree with given list of Elements.
•Step 2 - Transform the Binary Tree into Min Heap.
•Step 3 - Delete the root element from Min Heap
using Heapify method.
•Step 4 - Put the deleted element into the Sorted list.
•Step 5 - Repeat the same until Min Heap becomes empty.
•Step 6 - Display the sorted list.
Implementation of Heapsort
#include <stdio.h>
/* function to heapify a subtree. Here 'i' is the
index of root node in array a[], and 'n' is the size of he
ap. */
void heapify(int a[], int n, int i)
{
int largest = i; // Initialize largest as root
int left = 2 * i + 1; // left child
int right = 2 * i + 2; // right child
// If left child is larger than root
if (left < n && a[left] > a[largest])
largest = left;
// If right child is larger than root
if (right < n && a[right] > a[largest])
largest = right;
// If root is not largest
if (largest != i) {
// swap a[i] with a[largest]
int temp = a[i];
a[i] = a[largest];
a[largest] = temp;

heapify(a, n, largest);
}
}
/
*Function to implement the heap sort
*/
void heapSort(int a[], int n)
{
for (int i = n / 2 - 1; i >= 0; i--)
heapify(a, n, i);
// One by one extract an element fr
om heap
for (int i = n - 1; i >= 0; i--) {
/
* Move current root element to end*/
// swap a[0] with a[i] /
int temp = a[0]; * function to print the array elements
a[0] = a[i]; */
a[i] = temp; void printArr(int arr[], int n)
{
heapify(a, i, 0); for (int i = 0; i < n; ++i)
} {
} printf("%d", arr[i]);
printf(" ");
}
int main()
{
int a[] = {48, 10, 23, 43, 28, 26, 1};
int n = sizeof(a) / sizeof(a[0]);
printf("Before sorting array elements are - \
n");
printArr(a, n);
heapSort(a, n);
printf("\nAfter sorting array elements are - \
n");
printArr(a, n);
return 0;
}
Output
Heap sort complexity
Time complexity of Heap sort in the best case, average case, and worst case. We will also see the space
complexity of Heapsort.
1. Time Complexity

Best Case Complexity - It occurs when there is no sorting required, i.e. the array is already sorted.
The best-case time complexity of heap sort is O(n logn).
•Average Case Complexity - It occurs when the array elements are in jumbled order that is not
properly ascending and not properly descending. The average case time complexity of heap sort
is O(n log n).
•Worst Case Complexity - It occurs when the array elements are required to be sorted in reverse
order. That means suppose you have to sort the array elements in ascending order, but its elements are
in descending order. The worst-case time complexity of heap sort is O(n log n).
The time complexity of heap sort is O(n logn) in all three cases (best case, average case, and worst
case). The height of a complete binary tree having n elements is logn.
2. Space Complexity
•The space complexity of Heap sort is O(1).
Hashing
Hash table is one of the most important data
structures that uses a special function known
as a hash function that maps a given value
with a key to access the elements faster.
A Hash table is a data structure that stores
some information, and the information has
basically two main components, i.e., key and
value.
The hash table can be implemented with the
help of an associative array. The efficiency of
mapping depends upon the efficiency of the
hash function used for mapping.
Hash function:
Hash function is a mathematical formula,
produces an integer which can be used
as an index for the key in the hash
table.
Perfect Hash Function- Each key is
transformed into a unique storage
location
Imperfect hash Function- Maps more
than one key to the same storage
location
 For example, suppose the key value is John and the
value is the phone number, so when we pass the key
value in the hash function shown as below:
 Hash(key)= index;
 When we pass the key in the hash function, then it
gives the index.
 Hash(john) = 3;
 The above example adds the john at the index 3.
 Drawback of Hash function
 A Hash function assigns each value with a unique key.
Sometimes hash table uses an imperfect hash function
that causes a collision because the hash function
generates the same key of two different values.
Operation in hash function:
Insert - T[ h(key) ] = value;
It calculates the hash, uses it as the key and
stores the value in hash table.
Delete - T[ h(key) ] = NULL;
It calculates the hash, resets the value stored
in the hash table for that key.
Search - return T[ h(key) ];
It calculates the hash, finds and returns the
value stored in the hash table for that key.
How to choose a Hash Function

An efficient hash function should be built such that the


index value of the added item is distributed equally
across the table.
An effective collision resolution technique should be
created to generate an alternate index for a key
whose hash index corresponds to a previously inserted
position in a hash table.
We must select a hash algorithm that is fast to
calculate.
Characteristics of a good Hash Function
Uniform Distribution: For distribution throughout the
constructed table.
Fast: The generation of hash should be very fast, and
should not produce any considerable overhead.
Methods of hash function:
 Division Method
Multiplication Method
Mid Square Method
Folding Method
Division-method:
In this method we use modular arithmetic
system to divide the key value by some
integer division m. It gives us the location
value, where the element can be placed.
L=(k mod m)
L->Location in table
K->key value
m->table size
Eg: k=52, m=10 then
L=(52 mod 10)=2
The key whose value 52 is placed in 2nd
location.
Mid square method
In this we square the value of a key and
take the number of digits required to form
an
address, from the middle position of
squared value
(eg) key =16
Square is 256. Then the address as 56(two
digits starting form mid of 256).
 Collision
 When the two different values have the same value, then
the problem occurs between the two values, known as a
collision. In the above example, the value is stored at
index 6. If the key value is 26, then the index would be:
 h(26) = 26%10 = 6
 Therefore, two values are stored at the same index, i.e.,
6, and this leads to the collision problem. To resolve
these collisions, we have some techniques known as
collision techniques.
 The following are the collision techniques:
 Open Hashing: It is also known as closed addressing.
 Closed Hashing: It is also known as open addressing.
Open Hashing (Separate
Chaining)
It is the most commonly used collision
hashing technique implemented using
Lined List. When any two or more elements
collide at the same location, these
elements are chained into a single-linked
list called a chain.
chain all the elements in a linked list that
hash to the same slot.
Let’s consider an example of a simple hash
function.
h(key) = key%table size
In a hash table with the size 7
h(27) = 27%7 = 6
h(130) = 130%7 = 4
If we insert a new element
(18, “Saleema”), that would also go to the
4th index.
h(18) = 18%7 = 4
For separate chaining, the worst-case scenario is
when all of the keys will get the same hash value and
will be inserted in the same linked list. We can avoid
this by using a good hash function.
Closed Hashing
In Closed hashing, three techniques are
used to resolve the collision:
Linear probing
Quadratic probing
Double Hashing technique
Linear Probing
Linear probing is one of the forms of open
addressing.
each cell in the hash table contains a key-
value pair, so when the collision occurs by
mapping a new key to the cell already
occupied by another key, then linear
probing technique searches for the closest
free locations and adds a new key to that
empty cell.
In this case, searching is performed
sequentially, starting from the position
where the collision occurs till the empty cell
 If slot hash(x) % S is full, then we try (hash(x)
+ 1) % S

 If (hash(x) + 1) % S is also full, then we try


(hash(x) + 2) % S

 If (hash(x) + 2) % S is also full, then we try


(hash(x) + 3) % S
A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and
h(k) = 2k+3
The key values 3, 2, 9, 6 are stored at the indexes 9,
7, 1, 5 respectively. The calculated index value of 11 is
5 which is already occupied by another key value, i.e.,
6. When linear probing is applied, the nearest empty
cell to the index 5 is 6; therefore, the value 11 will be
added at the index 6.
The next key value is 13. The index value associated
with this key value is 9 when hash function is applied.
The cell is already filled at index 9. When linear
probing is applied, the nearest empty cell to the index
9 is 0; therefore, the value 13 will be added at the
index 0.
The next key value is 7. The index value
associated with the key value is 7 when
hash function is applied. The cell is already
filled at index 7. When linear probing is
applied, the nearest empty cell to the index
7 is 8; therefore, the value 7 will be added
at the index 8.
The next key value is 12. The index value
associated with the key value is 7 when
hash function is applied. The cell is already
filled at index 7. When linear probing is
applied, the nearest empty cell to the index
7 is 2; therefore, the value 12 will be added
Challenges in Linear Probing :
Primary Clustering: One of the problems
with linear probing is Primary clustering,
many consecutive elements form groups
and it starts taking time to find a free slot
or to search for an element.
Secondary Clustering: Secondary
clustering is less severe, two records only
have the same collision chain (Probe
Sequence) if their initial position is the
same.
Quadratic probing

In case of linear probing, searching is


performed linearly. In contrast, quadratic
probing is an open addressing technique
that uses quadratic polynomial for
searching until a empty slot is found.
It can also be defined as that it allows the
insertion ki at first free location
from (u+i2)%m where i=0 to m-1.
let hash(x) be the slot index computed
using hash function.
If slot hash(x) % S is full, then we try (hash(x) +
1*1) % S
If (hash(x) + 1*1) % S is also full, then we try
(hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try
(hash(x) + 3*3) % S
 Consider the same example which we discussed in the linear
probing.
 A = 3, 2, 9, 6, 11, 13, 7, 12 where m = 10, and h(k) = 2k+3
 The key values 3, 2, 9, 6 are stored at the indexes 9, 7, 1, 5,
respectively. We do not need to apply the quadratic probing
technique on these key values as there is no occurrence of the
collision.
 The index value of 11 is 5, but this location is already occupied
by the 6. So, we apply the quadratic probing technique.
 When i = 0
 Index= (5+02)%10 = 5
 When i=1
 Index = (5+12)%10 = 6
 Since location 6 is empty, so the value 11 will be added at the
index 6.
The next element is 13. When the hash function is
applied on 13, then the index value comes out to
be 9, which we already discussed in the chaining
method. At index 9, the cell is occupied by another
value, i.e., 3. So, we will apply the quadratic
probing technique to calculate the free location.
When i=0
Index = (9+02)%10 = 9
When i=1
Index = (9+12)%10 = 0
Since location 0 is empty, so the value 13 will be
added at the index 0.
The next element is 7. When the hash function is
applied on 7, then the index value comes out to be
7, which we already discussed in the chaining
method. At index 7, the cell is occupied by another
value, i.e., 7. So, we will apply the quadratic
probing technique to calculate the free location.
When i=0
Index = (7+02)%10 = 7
When i=1
Index = (7+12)%10 = 8
Since location 8 is empty, so the value 7 will be
added at the index 8.
 The next element is 12. When the hash function is applied on 12, then
the index value comes out to be 7. When we observe the hash table
then we will get to know that the cell at index 7 is already occupied by
the value 2. So, we apply the Quadratic probing technique on 12 to
determine the free location.
 When i=0
 Index= (7+02)%10 = 7
 When i=1
 Index = (7+12)%10 = 8
 When i=2
 Index = (7+22)%10 = 1
 When i=3
 Index = (7+32)%10 = 6
 When i=4
 Index = (7+42)%10 = 3
 Since the location 3 is empty, so the value 12 would be stored at the
index 3.
Therefore, the order of the elements is 13, 9, _, 12, _, 6, 11, 2, 7,
3.
Double Hashing
Double hashing is an open addressing technique which
is used to avoid the collisions. When the collision
occurs then this technique uses the secondary hash of
the key. It uses one hash value as an index to move
forward until the empty location is found.
In double hashing, two hash functions are used.
Suppose h1(k) is one of the hash functions used to
calculate the locations whereas h2(k) is another hash
function.
It can be defined as "insert ki at first free place
from (u+v*i)%m where i=(0 to m-1)". In this case, u is
the location computed using the hash function and v is
equal to (h2(k)%m).
let hash(x) be the slot index computed
using hash function.
 If slot hash(x) % S is full, then we try
(hash(x) + 1*hash2(x)) % S If (hash(x) +
1*hash2(x)) % S is also full,
then we try (hash(x) + 2*hash2(x)) % S If
(hash(x) + 2*hash2(x)) % S is also full,
then we try (hash(x) + 3*hash2(x)) % S
Linear probing has the best cache
performance but suffers from clustering.
One more advantage of Linear probing is
easy to compute.

Quadratic probing lies between the two in


terms of cache performance and
clustering.

Double hashing has poor cache


performance but no clustering. Double
hashing requires more computation time as
two hash functions need to be computed.
S.No. Separate Chaining Open Addressing

Open Addressing requires more


1. Chaining is Simpler to implement.
computation.

In chaining, Hash table never fills up,


In open addressing, table may
2. we can always add more elements to
become full.
chain.

Open addressing requires extra


Chaining is Less sensitive to the hash
3. care to avoid clustering and load
function or load factors.
factor.

Chaining is mostly used when it is


Open addressing is used when the
unknown how many and how
4. frequency and number of keys is
frequently keys may be inserted or
known.
deleted.

Cache performance of chaining is not Open addressing provides better


5. good as keys are stored using linked cache performance as everything
list. is stored in the same table.

In Open addressing, a slot can be


Wastage of Space (Some Parts of hash
6. used even if an input doesn’t map
table in chaining are never used).
to it.
Complexity and Load Factor

 For the first step, time taken depends on the K and the
hash function.
For example, if the key is a string “abcd”, then it’s hash
function may depend on the length of the string. But for
very large values of n, the number of entries into the
map, length of the keys is almost negligible in
comparison to n so hash computation can be considered
to take place in constant time, i.e, O(1).
 For the second step, traversal of the list of K-V pairs
present at that index needs to be done. For this, the
worst case may be that all the n entries are at the same
index. So, time complexity would be O(n).
 But, enough research has been done to make hash
functions uniformly distribute the keys in the array so this
almost never happens.
on an average, if there are n entries and b
is the size of the array there would be n/b
entries on each index. This value n/b is
called the load factor that represents the
load that is there on our map.
This Load Factor needs to be kept low, so
that number of entries at one index is less
and so is the complexity almost constant,
i.e., O(1).
Rehashing

rehashing means hashing again.


Basically, when the load factor increases to
more than its pre-defined value (default
value of load factor is 0.75), the complexity
increases.
To overcome this, the size of the array is
increased (doubled) and all the values are
hashed again and stored in the new double
sized array to maintain a low load factor
and low complexity.
Rehashing can be done as follows:

For each addition of a new entry to the map,


check the load factor.
If it’s greater than its pre-defined value (or
default value of 0.75 if not given), then
Rehash.
For Rehash, make a new array of double the
previous size and make it the new bucketarray.
Then traverse to each element in the old
bucketArray and call the insert() for each so as
to insert it into the new larger bucket array.

You might also like