Daa Miniproject
Daa Miniproject
ON
“Merge Sort & Multithreaded Merge Sort”
COMPUTER ENGINEERING
Submitted By
Maharshtra 412410
2024-2025
DEPARTMENT OF COMPUTER ENGINEERING SGOI
CERTIFICATE
This is to certify that the "Mini Report "submitted by Bhor Pradyumna (09), Wakchaure
Algorithm and submitted during 2024-2025 academic year, in partial fulfilment of the
COMPUTER ENGINEERING.
Date:- / /2024
Place:- Belhe
INDEX
Objective:
Compare time required and analyze performance of each algorithm for the best case and the worst
case.
Prerequisite:
Merge Sort:
Merge sort is a highly effective sorting algorithm that follows the divide-and- conquer approach.
The method involves recursively dividing an unsorted list into smaller sub- lists, each with one
element, and then progressively merging these sub-lists to create sorted sub-lists. One notable
feature of merge sort is its stability, meaning it preserves the relative order of equal elements during
the sorting process. The time complexity of merge sort is (O(n log n)), making it a preferred choice
for large datasets. However, it comes with a space complexity of (O(n)) due to the additional space
required for the merging process.
Merge sort is a divide-and-conquer algorithm that sorts an array by recursively dividing it into two
halves, sorting each half, and then merging the sorted halves back together. It is known for its
efficiency and is particularly useful for large datasets.
1. Divide: The algorithm divides the input array into two halves until each subarray contains a
single element (a single element is inherently sorted). the problem (unsorted array) is
recursively divided into smaller subproblems until they are simple enough to solve directly.
• Goal: Divide the array into two equal (or almost equal) halves.
• Process:
1. Start with an array of size n.
2. Recursively split the array into two halves until you reach subarrays of size 1 (which
are trivially sorted).
2. Conquer: Each of the two halves is sorted recursively using merge sort.the algorithm
recursively solves the subproblems.
• Process:
1. Since the subarrays of size 1 are trivially sorted, no further action is needed at this level.
2. Once the division reaches the base case (subarrays of size 1), the conquer step involves
no additional sorting work.
3. Combine: Finally, the two sorted halves are merged together to produce a single sorted array.
This merging process ensures that the resulting array is also sorted.The final and most crucial
step of the merge sort is to combine the sorted subarrays into a larger, sorted array.
• Goal: Merge two sorted subarrays back into one sorted array.
• Process:
o During the merging step, two sorted subarrays are combined into a single sorted array. o
This is done by comparing the smallest (leftmost) elements of both subarrays and placing
the smaller element into the resulting array, repeating until all elements are merged.
• Merge Process:
1. Compare the first element of the left subarray with the first element of the right
subarray.
2. Place the smaller of the two into the result array.
3. Move the pointer in the subarray where the smaller element came from.
4. Repeat the comparison and placing process until all elements from both subarrays are
merged.
Steps of Merge Sort
3. Merging:
o Once the two halves are sorted, merge them:
Compare the smallest elements of each half.
Append the smaller element to a new array.
Move the pointer forward in the half from which the element was taken.
Repeat until all elements from both halves have been added to the new array.
4. Return: The newly merged array is now sorted.
Example
Let's illustrate merge sort with an example. Suppose we have the following unsorted array: Input:
1. Splitting:
o Split into [38, 27, 43] and [3, 9, 82, 10] o Further split [38, 27, 43]
into [38] and [27, 43] o Split [27, 43] into [27] and [43] o
Further split [3, 9, 82, 10] into [3, 9] and [82, 10] o Continue
until all elements are single-element arrays.
2. Merging:
o Start merging back:
Merge [27] and [43] to get [27, 43]
Merge [38] and [27, 43] to get [27, 38, 43]
Merge [3] and [9] to get [3, 9]
Merge [82] and [10] to get [10, 82]
Finally, merge [3, 9] and [10, 82] to get [3, 9, 10, 82]
Merge [27, 38, 43] and [3, 9, 10, 82] to get the final sorted array: [3, 9, 10, 27,
38, 43, 82]
Time Complexity
Time complexity refers to the computational complexity that describes the amount of time it takes to
run an algorithm as a function of the size of the input. It's a way to estimate the efficiency of an
algorithm, especially for large input sizes.
The most common time complexities, ordered from fastest to slowest, are:
1. O(1) - Constant time: The algorithm takes the same amount of time regardless of input size.
2. O(log n) - Logarithmic time: The time increases logarithmically as the input size increases.
Example: binary search.
3. O(n) - Linear time: The time increases linearly with the input size. Example: iterating through a
list.
4. O(n log n) - Linearithmic time: More complex than linear but better than quadratic. Example:
merge sort, quicksort (average case).
5. O(n²) - Quadratic time: The time is proportional to the square of the input size. Example: bubble
sort, insertion sort (in worst case).
6. O(2ⁿ) - Exponential time: The time doubles with each additional element in the input. Example:
solving the traveling salesman problem by brute force.
7. O(n!) - Factorial time: The time grows factorially with the input size. Example: permutations of a
list.
Space Complexity
Space complexity refers to the amount of memory or storage space required by an algorithm or a
program to complete its execution. It's often measured in terms of how much additional memory
beyond the input itself is required, and it can vary depending on factors like data structures used,
recursion depth, and temporary variables created during execution.
Space complexity is defined as the total space required by an algorithm to execute, including both the
space for input values and the space required for processing. It can be expressed as:
• Fixed Part: This includes space required for constants, simple variables, fixed-size variables,
program code, etc. This part does not change with the size of the input.
• Variable Part: This includes space required for dynamic data structures (like arrays, linked
lists, trees, etc.) and recursion stack space. This part depends on the input size.
2. Auxiliary Space
Auxiliary space refers to the temporary or extra space used by the algorithm. It does not include the
space for the input data itself. For example:
• In-Place Algorithms: If an algorithm modifies the input data without using extra space, its
auxiliary space complexity is O(1)O(1)O(1).
• Non-In-Place Algorithms: If an algorithm creates a new data structure to hold intermediate
results, its auxiliary space may be O(n)O(n)O(n) or more, depending on the amount of data
stored.
• Constant Space Complexity O(1)O(1)O(1): The algorithm uses a fixed amount of space
regardless of the input size.
o Example: Swapping two numbers using a temporary variable.
• Linear Space Complexity O(n)O(n)O(n): The space required grows linearly with the input
size.
o Example: Storing the elements of
1. Stable Sorting: Merge sort maintains the relative order of equal elements.
2. Performance: It has a guaranteed time complexity of O(nlogn)O(n \log n)O(nlogn), which
is efficient for large datasets.
3. Divide and Conquer: It can be implemented with recursion, making it easier to understand and
implement.
1. Space Consumption: Merge sort requires additional space for temporary arrays during the
merging process, which can be significant for large datasets.
2. Not In-Place: Unlike quicksort, merge sort does not sort the array in place, which can lead to
higher memory usage.
3. Overhead from Threading: Creating and managing threads introduces overhead, which can
reduce performance gains, especially for small datasets.
4. Synchronization Costs: The merging phase requires careful synchronization between threads,
which can introduce delays.
5. Diminishing Returns: As the number of threads increases beyond the number of available
CPU cores, the performance improvement diminishes due to thread contention.
Multithreaded Sorting:
multiple processors or cores, where it can significantly improve scalability. However, challenges
such as synchronization, race conditions, and deadlocks must be carefully addressed in the
implementation to achieve optimal results.
a) : Performance Comparison
Best Case: O(n * log(n)) Worst
Case: O(n * log(n)) Average
Case: O(n * log(n))
1. Speedup:
o Near-linear speedup can be achieved in the best case. If the dataset is large and there's little
overhead, sorting time can be reduced by almost a factor of n, where n is the number of available
CPU cores.
o For example, if sorting a dataset on 8 cores would take 100 seconds in a single-threaded
implementation, it might only take around 12-15 seconds using multithreading, depending on
the overhead.
2. CPU Utilization:
o All CPU cores are fully utilized throughout the sorting process. There are minimal idle periods
since each thread has a roughly equal workload.
3. Memory Efficiency:
o Memory contention is minimized, allowing for smooth memory access during both sorting and
merging phases.
4. Thread Management:
o The thread pool efficiently manages threads, reducing the cost of repeatedly creating and
destroying threads
1)Time Complexity: The basic time complexity of merge sort is still O(n log n), but in the worst case,
the constant factors grow significantly due to thread management, memory access delays, and
synchronization overhead.Worst-case multithreading overhead can result in superlinear growth in
execution time.
Thread creation and destruction can add significant time complexity, especially if each recursive
step spawns a new thread, leading to overhead that scales poorly.
In practice, this overhead can make multithreaded sorting slower than O(n log n) for small datasets or
on systems with limited resources.
3)Context Switching:
If the number of threads far exceeds the number of CPU cores, frequent context switching between
threads can degrade performance. The time spent on context switching can add significant overhead,
turning the time complexity closer to O(n^2) in extreme cases due to inefficiencies.
Average Case: O(n * log(n))
1) Speedup:
In the average case, we expect a moderate speedup, where performance improves but does not scale
perfectly with the number of threads. For example:
On a system with 4 cores, a single-threaded merge sort might take 100 seconds, while a multithreaded
merge sort might reduce the time to around 30-40 seconds.
The speedup is noticeable but not as high as in the best case due to factors like thread management,
synchronization delays, and memory contention.
2) CPU Utilization:
CPU utilization is high but not perfect. Most cores are actively engaged in sorting tasks, but there
may be small periods of idle time due to uneven workloads or synchronization bottlenecks.
3) Memory Efficiency:
Cache misses and memory contention slightly affect performance. Threads accessing shared
memory regions during the merging phase cause some delays, but this does not severely degrade
performance.
4)Scalability:
Multithreaded merge sort scales reasonably well in the average case, but the scalability is limited by
the number of cores, synchronization costs, and memory bandwidth. As more threads are added,
performance improves, but the gains diminish with each additional thread.
In terms of time complexity, both standard merge sort and multithreaded merge sort have the same
time complexity for best, worst, and average cases.
The performance analysis for both algorithms is the same because they both have the same time
complexity. In the best and worst cases, they exhibit O(n log n) time complexity.
In summary, the multithreaded merge sort is a parallel version of the standard merge sort and can
provide improved performance for large datasets on multi- core processors.
Multithreaded sorting is an optimization technique where a sorting algorithm is parallelized across
multiple threads to speed up the sorting process. This can be particularly effective when sorting large
datasets, taking advantage of multiple CPU cores for better performance.
The array is split recursively into smaller subarrays. For example, if the array has 8 elements,
it is split into two arrays of 4 elements each, then those are further split until individual
elements are reached.
• Sorting in Parallel:
At each level of recursion, the left half of the array is sorted using a new thread, while the
right half is sorted within the same thread.
By utilizing threads to process the left and right halves in parallel, the overall computation
time can be reduced.
After both halves of the array are sorted, the results are merged. The merging step is sequential
and not parallelized.
The merge process ensures that both halves are combined into a single sorted array by
comparing elements from both halves and placing the smallest one first.
Multithreading Considerations
• Thread Overhead: Creating a thread has overhead, so it may not be beneficial for small arrays.
It’s typically more efficient to use multithreading only when the array size is large enough to
justify the cost of managing threads.
• Thread Synchronization: Careful management of threads is needed to avoid race conditions.
For example, we use left_thread.join() to ensure that the left half is fully sorted before proceeding
to the merge step.
• Optimal Thread Count: If too many threads are created (e.g., one per small subarray), the
overhead can outweigh the benefits. Ideally, the number of threads should not exceed the
number of available CPU cores.
• Scalability: The more cores a system has, the more efficiently it can handle multithreaded
sorting. In environments with many CPU cores, this technique can result in significant speed
improvements.
Advanced Optimizations
• Thread Pooling: Instead of creating new threads for each recursive call, we can use a thread
pool to manage a limited number of threads. This reduces the overhead of constantly creating
and destroying threads.
• Task Parallelism: In more advanced implementations, a task-based model could be used
(such as concurrent.futures.ThreadPoolExecutor in Python), where tasks (sorting subarrays) are
submitted to a thread pool and processed as threads become available
Benefits:
• Overhead of thread creation: If the dataset is small, the overhead of managing threads can
outweigh the benefits.
• Synchronization issues: Proper synchronization is needed when merging the results to avoid
race conditions or data corruption.
1. Improved Performance:
o Parallel Execution: Multithreading allows for concurrent execution of tasks, enabling better
utilization of CPU resources, especially on multi-core processors. This can lead to significant
performance improvements in compute-intensive applications.
o Faster Program Execution: Tasks that are independent of each other can be executed
simultaneously, resulting in reduced execution time.
2. Responsiveness:
o User Interface (UI) Responsiveness: In applications with graphical user interfaces (GUIs),
multithreading allows the UI to remain responsive while performing background tasks (like
data processing or file downloading). This enhances the user experience.
o Asynchronous Operations: Operations that might take a long time to complete (like network
requests or file I/O) can run in the background, keeping the main thread responsive.
3. Resource Sharing:
o Shared Memory: Threads within the same process share the same memory space, which
allows for efficient data sharing and communication compared to processes that require
interprocess communication (IPC).
o Lower Overhead: Creating and managing threads is generally less resource-intensive than
managing separate processes.
4. Scalability:
o Load Balancing: Multithreading can help distribute workloads evenly across multiple CPU
cores, allowing applications to scale better with increasing workloads.
o Dynamic Resource Allocation: Threads can be created and destroyed dynamically based on
the application's needs, improving resource management.
5. Simplified Program Structure:
o Logical Structure: Some applications are inherently concurrent (e.g., web servers, real-time
simulations). Multithreading allows for a more natural representation of these applications.
Disadvantages of Multithreading
1. Complexity:
o Increased Complexity: Designing and implementing multithreaded applications can be
complex due to potential issues like race conditions, deadlocks, and thread synchronization.
o Debugging Difficulty: Bugs in multithreaded applications can be challenging to reproduce and
debug, as they may depend on the timing and interleaving of threads.
2. Synchronization Overhead:
o Performance Costs: Synchronization mechanisms (like mutexes and semaphores) are often
required to manage access to shared resources. This can introduce overhead and slow down the
application, especially if threads spend a lot of time waiting for locks. o Potential for
Deadlocks: Improper synchronization can lead to deadlocks, where two or more threads are
waiting indefinitely for resources held by each other.
3. Resource Contention:
o Limited Resources: When multiple threads try to access shared resources simultaneously, it
can lead to contention, reducing performance and causing bottlenecks.
o Cache Coherency Issues: Threads can lead to cache invalidation problems in multi-core
systems, affecting performance.
4. Diminishing Returns:
o Overhead of Context Switching: As more threads are added, the overhead of context
switching (switching CPU resources from one thread to another) can offset performance gains,
especially if the number of threads exceeds the number of available cores.
5. Memory Consumption:
o Stack Space: Each thread requires its own stack space, which can lead to increased memory
consumption, especially in applications with a large number of threads.
Merge Sort:-
def merge(arr, l, m, r): n1
=m-l+1
n2 = r - m
L = [0] * (n1)
R = [0] * (n2)
in range(0, n2):
R[j] = arr[m + 1 + j]
i =0
arr[k] = L[i] i
+= 1
else:
arr[k] = R[j] j
+= 1
k += 1
arr[k] = L[i]
i += 1
k += 1
arr[k] = R[j]
j += 1
k += 1
m = l+(r-l)//2
mergeSort(arr, l, m)
mergeSort(arr, m+1, r)
merge(arr, l, m, r)
arr = [12, 11, 13, 5, 6, 7]
n = len(arr)
for i in range(n):
array is")
for i in range(n):
Output:-
Multithreaded Sorting:-
import threading def
merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
= []
i=j=0
merged.append(left[i]) i
+= 1
else:
merged.append(right[j]
merged.append(left[i]) i += 1
merged.append(right[j]) j +=
return merged
def multi_threaded_merge_sort(arr, num_threads):
if num_threads <= 1:
return merge_sort(arr)
len(arr) % num_threads != 0:
sublists[-1].extend(arr[size *
sublist):
thread.join()
merged = sorted_sublists[0]
= merge(merged, sublist)
return merged
input_list = [4, 5, 8, 3, 0, 5, 3, 9, 4, 3]
num_threads = 2
print("Original List:", input_list)
List:", sorted_list)
Output:-
Conclusion:-
We have Understand the concept of Merge Sort and Multithreaded sorting. Merge
Sort is an efficient, stable algorithm with consistent O(n log n) performance across all
cases but with additional space overhead. Multithreaded sorting can significantly
reduce sorting time for large datasets by utilizing multiple cores but introduces
complexity and overhead in terms of synchronization and thread management.