[go: up one dir, main page]

0% found this document useful (0 votes)
15 views25 pages

SORTING

The document discusses various sorting algorithms, categorizing them into internal and external sorting, with examples such as heap sort, merge sort, quick sort, and radix sort. It provides detailed algorithms and implementations for heap sort, merge sort, and quick sort, highlighting their efficiency and usage in sorting data. Additionally, it explains the process of radix sort, emphasizing its digit-by-digit sorting approach for integers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views25 pages

SORTING

The document discusses various sorting algorithms, categorizing them into internal and external sorting, with examples such as heap sort, merge sort, quick sort, and radix sort. It provides detailed algorithms and implementations for heap sort, merge sort, and quick sort, highlighting their efficiency and usage in sorting data. Additionally, it explains the process of radix sort, emphasizing its digit-by-digit sorting approach for integers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

SORTING

Internal Sorting :
 When all data is placed in the main memory or internal memory then
sorting is called internal sorting.
 In internal sorting, the problem cannot take input beyond its size.
 Example: heap sort, bubble sort, selection sort, quick sort, shell sort,
insertion sort.
External Sorting :
 When all data that needs to be sorted cannot be placed in memory at a
time, the sorting is called external sorting. External Sorting is
used for the massive amount of data.
 Merge Sort and its variations are typically used for external sorting.
 Some external storage like hard disks and CDs are used for external
sorting.
 Example: Merge sort, Tag sort, Polyphase sort, Four tape sort,
External radix sort, Internal merge sort, etc.
HEAP SORT
Heapsort is a popular and efficient sorting algorithm. The concept of heap sort is to
eliminate the elements one by one from the heap part of the list, and then insert them
into the sorted part of the list.

Heapsort is the in-place sorting algorithm.

Now, let's see the algorithm of heap sort.

Algorithm
1. HeapSort(arr)
2. BuildMaxHeap(arr)
3. for i = length(arr) to 2
4. swap arr[1] with arr[i]
5. heap_size[arr] = heap_size[arr] ? 1
6. MaxHeapify(arr,1)
7. End
BuildMaxHeap(arr)

1. BuildMaxHeap(arr)
2. heap_size(arr) = length(arr)
3. for i = length(arr)/2 to 1
4. MaxHeapify(arr,i)
5. End
MaxHeapify(arr,i)

1. MaxHeapify(arr,i)
2. L = left(i)
3. R = right(i)
4. if L ? heap_size[arr] and arr[L] > arr[i]
5. largest = L
6. else
7. largest = i
8. if R ? heap_size[arr] and arr[R] > arr[largest]
9. largest = R
10. if largest != i
11. swap arr[i] with arr[largest]
12. MaxHeapify(arr,largest)
13. End

HEAP SORT PROGRAM:


#include <stdio.h>

// To heapify a subtree rooted with node i

// which is an index in arr[].

void heapify(int arr[], int n, int i) {

// Initialize largest as root

int largest = i;

// left index = 2*i + 1

int l = 2 * i + 1;

// right index = 2*i + 2

int r = 2 * i + 2;

// If left child is larger than root

if (l < n && arr[l] > arr[largest]) {

largest = l;
}

// If right child is larger than largest so far

if (r < n && arr[r] > arr[largest]) {

largest = r;

// If largest is not root

if (largest != i) {

int temp = arr[i];

arr[i] = arr[largest];

arr[largest] = temp;

// Recursively heapify the affected sub-tree

heapify(arr, n, largest);

// Main function to do heap sort

void heapSort(int arr[], int n) {

// Build heap (rearrange array)

for (int i = n / 2 - 1; i >= 0; i--) {

heapify(arr, n, i);

}
// One by one extract an element from heap

for (int i = n - 1; i > 0; i--) {

// Move current root to end

int temp = arr[0];

arr[0] = arr[i];

arr[i] = temp;

// Call max heapify on the reduced heap

heapify(arr, i, 0);

// A utility function to print array of size n

void printArray(int arr[], int n) {

for (int i = 0; i < n; i++) {

printf("%d ", arr[i]);

printf("\n");

// Driver's code

int main() {

int arr[] = {9, 4, 3, 8, 10, 2, 5};

int n = sizeof(arr) / sizeof(arr[0]);


heapSort(arr, n);

printf("Sorted array is \n");

printArray(arr, n);

return 0;

MERGE SORT
Merge sort is the sorting technique that follows the divide and conquer approach. This
article will be very helpful and interesting to students as they might face merge sort as a
question in their examinations. In coding or technical interviews for software engineers,
sorting algorithms are widely asked. So, it is important to discuss the topic.

Merge sort is similar to the quick sort algorithm as it uses the divide and conquer
approach to sort the elements. It is one of the most popular and efficient sorting
algorithm. It divides the given list into two equal halves, calls itself for the two halves and
then merges the two sorted halves. We have to define the merge() function to perform
the merging.

The sub-lists are divided again and again into halves until the list cannot be divided
further. Then we combine the pair of one element lists into two-element lists, sorting
them in the process. The sorted two-element pairs is merged into the four-element lists,
and so on until we get the sorted list.

In the following algorithm, arr is the given array, beg is the starting element, and end is
the last element of the array.

1. MERGE_SORT(arr, beg, end)


2.
3. if beg < end
4. set mid = (beg + end)/2
5. MERGE_SORT(arr, beg, mid)
6. MERGE_SORT(arr, mid + 1, end)
7. MERGE (arr, beg, mid, end)
8. end of if
9.
10. END MERGE_SORT
The important part of the merge sort is the MERGE function. This function performs the
merging of two sorted sub-arrays that are A[beg…mid] and A[mid+1…end], to build
one sorted array A[beg…end]. So, the inputs of the MERGE function are A[], beg,
mid, and end.
MERGE SORT PROGRAM:
// C program for Merge Sort

#include <stdio.h>

#include <stdlib.h>

// Merges two subarrays of arr[].

// First subarray is arr[l..m]

// Second subarray is arr[m+1..r]

void merge(int arr[], int l, int m, int r)

int i, j, k;

int n1 = m - l + 1;

int n2 = r - m;

// Create temp arrays

int L[n1], R[n2];

// Copy data to temp arrays L[] and R[]

for (i = 0; i < n1; i++)

L[i] = arr[l + i];

for (j = 0; j < n2; j++)

R[j] = arr[m + 1 + j];

// Merge the temp arrays back into arr[l..r

i = 0;

j = 0;
k = l;

while (i < n1 && j < n2) {

if (L[i] <= R[j]) {

arr[k] = L[i];

i++;

else {

arr[k] = R[j];

j++;

k++;

// Copy the remaining elements of L[],

// if there are any

while (i < n1) {

arr[k] = L[i];

i++;

k++;

// Copy the remaining elements of R[],

// if there are any

while (j < n2) {

arr[k] = R[j];

j++;
k++;

// l is for left index and r is right index of the

// sub-array of arr to be sorted

void mergeSort(int arr[], int l, int r)

if (l < r) {

int m = l + (r - l) / 2;

// Sort first and second halves

mergeSort(arr, l, m);

mergeSort(arr, m + 1, r);

merge(arr, l, m, r);

// Function to print an array

void printArray(int A[], int size)

int i;

for (i = 0; i < size; i++)

printf("%d ", A[i]);

printf("\n");
}

// Driver code

int main()

int arr[] = { 12, 11, 13, 5, 6, 7 };

int arr_size = sizeof(arr) / sizeof(arr[0]);

printf("Given array is \n");

printArray(arr, arr_size);

mergeSort(arr, 0, arr_size - 1);

printf("\nSorted array is \n");

printArray(arr, arr_size);

return 0;

QUICK SORT
The working procedure of Quicksort is also simple. This article will be very helpful and
interesting to students as they might face quicksort as a question in their examinations.
So, it is important to discuss the topic.

Sorting is a way of arranging items in a systematic manner. Quicksort is the widely used
sorting algorithm that makes n log n comparisons in average case for sorting an array of
n elements. It is a faster and highly efficient sorting algorithm. This algorithm follows the
divide and conquer approach. Divide and conquer is a technique of breaking down the
algorithms into subproblems, then solving the subproblems, and combining the results
back together to solve the original problem.

Divide: In Divide, first pick a pivot element. After that, partition or rearrange the array
into two sub-arrays such that each element in the left sub-array is less than or equal to
the pivot element and each element in the right sub-array is larger than the pivot
element.
Conquer: Recursively, sort two subarrays with Quicksort.

Combine: Combine the already sorted array.

Quicksort picks an element as pivot, and then it partitions the given array around the
picked pivot element. In quick sort, a large array is divided into two arrays in which one
holds values that are smaller than the specified value (Pivot), and another array holds
the values that are greater than the pivot.

After that, left and right sub-arrays are also partitioned using the same approach. It will
continue until the single element remains in the sub-array.

Choosing the pivot


Picking a good pivot is necessary for the fast implementation of quicksort. However, it is
typical to determine a good pivot. Some of the ways of choosing a pivot are as follows -

o Pivot can be random, i.e. select the random pivot from the given array.
o Pivot can either be the rightmost element of the leftmost element of the given
array.
o Select median as the pivot element.

Algorithm
Algorithm:

1. QUICKSORT (array A, start, end)


2. {
3. 1 if (start < end)
4. 2{
5. 3 p = partition(A, start, end)
6. 4 QUICKSORT (A, start, p - 1)
7. 5 QUICKSORT (A, p + 1, end)
8. 6}
9. }
Partition Algorithm:

The partition algorithm rearranges the sub-arrays in a place.

1. PARTITION (array A, start, end)


2. {
3. 1 pivot ? A[end]
4. 2 i ? start-1
5. 3 for j ? start to end -1 {
6. 4 do if (A[j] < pivot) {
7. 5 then i ? i + 1
8. 6 swap A[i] with A[j]
9. 7 }}
10. 8 swap A[i+1] with A[end]
11. 9 return i+1
12. }

QUICK SORT PROGRAM:


#include <stdio.h>

void swap(int* a, int* b);

// Partition function

int partition(int arr[], int low, int high) {

// Choose the pivot

int pivot = arr[high];

// Index of smaller element and indicates

// the right position of pivot found so far

int i = low - 1;

// Traverse arr[low..high] and move all smaller

// elements to the left side. Elements from low to


// i are smaller after every iteration

for (int j = low; j <= high - 1; j++) {

if (arr[j] < pivot) {

i++;

swap(&arr[i], &arr[j]);

// Move pivot after smaller elements and

// return its position

swap(&arr[i + 1], &arr[high]);

return i + 1;

// The QuickSort function implementation

void quickSort(int arr[], int low, int high) {

if (low < high) {

// pi is the partition return index of pivot

int pi = partition(arr, low, high);

// Recursion calls for smaller elements

// and greater or equals elements

quickSort(arr, low, pi - 1);

quickSort(arr, pi + 1, high);

}
}

void swap(int* a, int* b) {

int t = *a;

*a = *b;

*b = t;

int main() {

int arr[] = {10, 7, 8, 9, 1, 5};

int n = sizeof(arr) / sizeof(arr[0]);

quickSort(arr, 0, n - 1);

for (int i = 0; i < n; i++) {

printf("%d ", arr[i]);

return 0;

GENERAL RADIX SORT:

Radix Sort Algorithm


In this article, we will discuss the Radix sort Algorithm. Radix sort is the linear sorting
algorithm that is used for integers. In Radix sort, there is digit by digit sorting is
performed that is started from the least significant digit to the most significant digit.

The process of radix sort works similar to the sorting of students names, according to the
alphabetical order. In this case, there are 26 radix formed due to the 26 alphabets in
English. In the first pass, the names of students are grouped according to the ascending
order of the first letter of their names. After that, in the second pass, their names are
grouped according to the ascending order of the second letter of their name. And the
process continues until we find the sorted list.

Now, let's see the algorithm of Radix sort.

Algorithm
1. radixSort(arr)
2. max = largest element in the given array
3. d = number of digits in the largest element (or, max)
4. Now, create d buckets of size 0 - 9
5. for i -> 0 to d
6. sort the array elements using counting sort (or any stable sort) according to the digits at
7. the ith place

Working of Radix sort Algorithm


Now, let's see the working of Radix sort Algorithm.

The steps used in the sorting of radix sort are listed as follows -

o First, we have to find the largest element (suppose max) from the given array.
Suppose 'x' be the number of digits in max. The 'x' is calculated because we need to
go through the significant places of all elements.
o After that, go through one by one each significant place. Here, we have to use any
stable sorting algorithm to sort the digits of each significant place.
Now let's see the working of radix sort in detail by using an example. To understand it
more clearly, let's take an unsorted array and try to sort it using radix sort. It will make
the explanation clearer and easier.

In the given array, the largest element is 736 that have 3 digits in it. So, the loop will run
up to three times (i.e., to the hundreds place). That means three passes are required to
sort the array.

Now, first sort the elements on the basis of unit place digits (i.e., x = 0). Here, we are
using the counting sort algorithm to sort the elements.

Pass 1:
In the first pass, the list is sorted on the basis of the digits at 0's place.
After the first pass, the array elements are -

Pass 2:
In this pass, the list is sorted on the basis of the next significant digits (i.e., digits at
10th place).
After the second pass, the array elements are -

Pass 3:
In this pass, the list is sorted on the basis of the next significant digits (i.e., digits at
100th place).

After the third pass, the array elements are -

Now, the array is sorted in ascending order

SEQUENTIAL SEARCH:
// C code to linearly search x in arr[].

#include <stdio.h>

int search(int arr[], int N, int x)


{

for (int i = 0; i < N; i++)

if (arr[i] == x)

return i;

return -1;

// Driver code

int main(void)

int arr[] = { 2, 3, 4, 10, 40 };

int x = 10;

int N = sizeof(arr) / sizeof(arr[0]);

// Function call

int result = search(arr, N, x);

(result == -1)

? printf("Element is not present in array")

: printf("Element is present at index %d", result);

return 0;

BINARY SEARCH:
// C program to implement iterative Binary Search

#include <stdio.h>
// An iterative binary search function.

int binarySearch(int arr[], int low, int high, int x)

while (low <= high) {

int mid = low + (high - low) / 2;

// Check if x is present at mid

if (arr[mid] == x)

return mid;

// If x greater, ignore left half

if (arr[mid] < x)

low = mid + 1;

// If x is smaller, ignore right half

else

high = mid - 1;

// If we reach here, then element was not present

return -1;

// Driver code

int main(void)
{

int arr[] = { 2, 3, 4, 10, 40 };

int n = sizeof(arr) / sizeof(arr[0]);

int x = 10;

int result = binarySearch(arr, 0, n - 1, x);

if(result == -1) printf("Element is not present in array");

else printf("Element is present at index %d",result);

HASHING:
Hashing is a technique used in data structures that efficiently stores and
retrieves data in a way that allows for quick access. It involves mapping
data to a specific index in a hash table using a hash function that
enables fast retrieval of information based on its key. This method is
commonly used in databases, caching systems, and various programming
applications to optimize search and retrieval operations. The great thing
about hashing is, we can achieve all three operations (search, insert and
delete) in O(1) time on average.
Hashing refers to the process of generating a fixed-size output from an
input of variable size using the mathematical formulas known as hash
functions. This technique determines an index or location for the storage
of an item in a data structure.
Need for Hash data structure
The amount of data on the internet is growing exponentially every day,
making it difficult to store it all effectively. In day-to-day programming, this
amount of data might not be that big, but still, it needs to be stored,
accessed, and processed easily and efficiently. A very common data
structure that is used for such a purpose is the Array data structure.
Now the question arises if Array was already there, what was the need for
a new data structure! The answer to this is in the word “efficiency“.
Though storing in Array takes O(1) time, searching in it takes at
least O(log n) time. This time appears to be small, but for a large data set,
it can cause a lot of problems and this, in turn, makes the Array data
structure inefficient.
So now we are looking for a data structure that can store the data and
search in it in constant time, i.e. in O(1) time. This is how Hashing data
structure came into play. With the introduction of the Hash data structure,
it is now possible to easily store data in constant time and retrieve them in
constant time as well.
Components of Hashing
There are majorly three components of hashing:
1. Key: A Key can be anything string or integer which is fed as input in
the hash function the technique that determines an index or location for
storage of an item in a data structure.
2. Hash Function: The hash function receives the input key and returns
the index of an element in an array called a hash table. The index is
known as the hash index.
3. Hash Table: Hash table is a data structure that maps keys to values
using a special function called a hash function. Hash stores the data in
an associative manner in an array where each data value has its own
unique index.

Components of Hashing

What is Collision?
The hashing process generates a small number for a big key, so there is a
possibility that two keys could produce the same value. The situation
where the newly inserted key maps to an already occupied, and it must be
handled using some collision handling technology.

Collision in Hashing

Advantages of Hashing in Data Structures


 Key-value support: Hashing is ideal for implementing key-value data
structures.
 Fast data retrieval: Hashing allows for quick access to elements with
constant-time complexity.
 Efficiency: Insertion, deletion, and searching operations are highly
efficient.
 Memory usage reduction: Hashing requires less memory as it
allocates a fixed space for storing elements.
 Scalability: Hashing performs well with large data sets, maintaining
constant access time.
 Security and encryption: Hashing is essential for secure data storage
and integrity verification.
Hash Functions and Types of Hash functions
Hash functions are a fundamental concept in computer science and play
a crucial role in various applications such as data storage, retrieval, and
cryptography. In data structures and algorithms (DSA), hash functions are
primarily used in hash tables, which are essential for efficient data
management. This article delves into the intricacies of hash functions,
their properties, and the different types of hash functions used in DSA.
What is a Hash Function?
A hash function is a function that takes an input (or ‘message’) and
returns a fixed-size string of bytes. The output, typically a number, is
called the hash code or hash value. The main purpose of a hash
function is to efficiently map data of arbitrary size to fixed-size values,
which are often used as indexes in hash tables.
Key Properties of Hash Functions
 Deterministic: A hash function must consistently produce the same
output for the same input.
 Fixed Output Size: The output of a hash function should have a fixed
size, regardless of the size of the input.
 Efficiency: The hash function should be able to process input quickly.
 Uniformity: The hash function should distribute the hash values
uniformly across the output space to avoid clustering.
 Pre-image Resistance: It should be computationally infeasible to
reverse the hash function, i.e., to find the original input given a hash
value.
 Collision Resistance: It should be difficult to find two different inputs
that produce the same hash value.
 Avalanche Effect: A small change in the input should produce a
significantly different hash value.
Applications of Hash Functions
 Hash Tables: The most common use of hash functions in DSA is in
hash tables, which provide an efficient way to store and retrieve data.
 Data Integrity: Hash functions are used to ensure the integrity of data
by generating checksums.
 Cryptography: In cryptographic applications, hash functions are used
to create secure hash algorithms like SHA-256.
 Data Structures: Hash functions are utilized in various data structures
such as Bloom filters and hash sets.
Types of Hash Functions
There are many hash functions that use numeric or alphanumeric keys.
This article focuses on discussing different hash functions:
1. Division Method.
2. Multiplication Method
1. Division Method
The division method involves dividing the key by a prime number and
using the remainder as the hash value.
h(k)=k mod m
Where k is the key and 𝑚m is a prime number.
Advantages:
 Simple to implement.
 Works well when 𝑚m is a prime number.
Disadvantages:
 Poor distribution if 𝑚m is not chosen wisely.
2. Multiplication Method
In the multiplication method, a constant 𝐴A (0 < A < 1) is used to multiply
the key. The fractional part of the product is then multiplied by 𝑚m to get
the hash value.
h(k)=⌊m(kAmod1)⌋
Where ⌊ ⌋ denotes the floor function.
Advantages:
 Less sensitive to the choice of 𝑚m.
Disadvantages:
 More complex than the division method.
Collision in Hashing
In this, the hash function is used to find the index of the array. The hash
value is used to create an index for the key in the hash table. The hash
function may return the same hash value for two or more keys. When two
or more keys have the same hash value, a collision happens. To handle
this collision, we use collision resolution techniques.
Collision Resolution Techniques
There are two types of collision resolution techniques.
 Separate chaining (open hashing)
 Open addressing (closed hashing)
Separate chaining: This method involves making a linked list out of the
slot where the collision happened, then adding the new key to the list.
Separate chaining is the term used to describe how this connected list of
slots resembles a chain. It is more frequently utilized when we are unsure
of the number of keys to add or remove.
Time complexity
 Its worst-case complexity for searching is o(n).
 Its worst-case complexity for deletion is o(n).
Advantages of separate chaining
 It is easy to implement.
 The hash table never fills full, so we can add more elements to the
chain.
 It is less sensitive to the function of the hashing.
Disadvantages of separate chaining
 In this, the cache performance of chaining is not good.
 Memory wastage is too much in this method.
 It requires more space for element links.
Open addressing: To prevent collisions in the hashing table, open
addressing is employed as a collision-resolution technique. No key is kept
anywhere else besides the hash table. As a result, the hash table’s size is
never equal to or less than the number of keys. Additionally known as
closed hashing.
The following techniques are used in open addressing:
 Linear probing
 Quadratic probing
 Double hashing
Linear probing: This involves doing a linear probe for the following slot
when a collision occurs and continuing to do so until an empty slot is
discovered.
The worst time to search for an element in linear probing is O. The cache
performs best with linear probing, but clustering is a concern. This
method’s key benefit is that it is simple to calculate.
Disadvantages of linear probing:
 The main problem is clustering.
 It takes too much time to find an empty slot.
Quadratic probing: When a collision happens in this, we probe for the i2-
nd slot in the ith iteration, continuing to do so until an empty slot is
discovered. In comparison to linear probing, quadratic probing has a
worse cache performance. Additionally, clustering is less of a concern
with quadratic probing.
Double hashing: In this, you employ a different hashing algorithm, and in
the ith iteration, you look for (i * hash 2(x)). The determination of two hash
functions requires more time. Although there is no clustering issue, the
performance of the cache is relatively poor when using double probing.

You might also like