CS 253 4 Sorting
CS 253 4 Sorting
Woodruff Library
2
Motivation
• Example: Students
• NetID (Primary Key)
• Last Name
• First Name
• Program Information
• Address
3
Motivation
• Example: Students
• NetID (Primary Key)
• Last Name
• First Name
• Program Information
• Address
• The Primary Key of a relation uniquely identifies each object. In the
following, we assume that the primary key is an integer.
4
Motivation: Linear Search
16 74 43 18 13 66 75 12 23 5
• Algorithm 1 (naïve):
• Sequentially check each entry, if it is the to see if that is the one you are
looking for.
• How many comparisons do you need?
• Best case: 1
• Worst case: n
• Average cases: (1+2+…+n)/n
!!"!" $"%
• Sum of arithmetic series: 𝑆 = 𝑛 =𝑛
# #
& $"%
• Average: =
% #
• Then, what is the complexity for best, worst, and average cases?
• In 𝑂(𝑛), where 𝑛 is the number of entries.
5
Motivation: Binary Search
16 74 43 18 13 66 75 12 23 5
• We need a more organized system like Emory Libraries.
• Algorithm 2:
• Sort the array first
5 12 13 16 18 23 43 66 74 75
• Then do a binary search
6
Binary Search: Algorithm
5 12 13 16 18 23 43 66 74 75
• Algorithm 2 (Binary Search for x):
• Step1: Initialize Pointers
• Step2: Compute Middle Index
• Step3: Compare Target with Middle Element
• Step4: Repeat or End
7
Binary Search: Algorithm
5 12 13 16 18 23 43 66 74 75
• Algorithm 2 (Binary Search for x):
• Step1: Initialize Pointers:
• Set low to the index of the first element in the list (usually 0).
• Set high to the index of the last element in the list (usually length of list - 1).
• Step2: Compute Middle Index
• Step3: Compare Target with Middle Element
• Step4: Repeat or End
8
Binary Search: Algorithm
5 12 13 16 18 23 43 66 74 75
• Algorithm 2 (Binary Search for x):
• Step1: Initialize Pointers
• Step2: Compute Middle Index:
• Calculate the middle index as mid = low + (high - low) / 2. This helps prevent potential
overflow.
• Step3: Compare Target with Middle Element
• Step4: Repeat or End
9
Binary Search: Algorithm
5 12 13 16 18 23 43 66 74 75
• Algorithm 2 (Binary Search for x):
• Step1: Initialize Pointers
• Step2: Compute Middle Index
• Step3: Compare Target with Middle Element:
• If x=mid:
• You've found the target. Return the index mid.
• If x<mid:
• The target must be in the left half of the list. Set high to mid - 1 and go back to step 2.
• If x>mid:
• The target must be in the right half of the list. Set low to mid + 1 and go back to step 2.
• Step4: Repeat or End
10
Binary Search: Algorithm
5 12 13 16 18 23 43 66 74 75
• Algorithm 2 (Binary Search for x):
• Step1: Initialize Pointers
• Step2: Compute Middle Index
• Step3: Compare Target with Middle Element
• Step4: Repeat or End:
• Continue repeating steps 2 and 3 until low exceeds high. If this happens, x is not in the
list, and the search should return a value indicating that x is not found (e.g., null).
11
Binary Search: Complexity
5 12 13 16 18 23 43 66 74 75
• Algorithm 2 (Binary Search for x):
• How many comparisons do we expect?
• Best cases: 1
• Worst cases: e.g., 4 when n=10
• 𝑊𝑜𝑟𝑠𝑡 ≈ log # 𝑛
• Average cases:
$⋅$"#⋅#"(⋅)"⋯"+,-# %⋅ #$%&# "'! $+,- % $
• 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 ≈ = ∑./$# 𝑖 ⋅ 2.0$ = log # 𝑛 − 1 +
% % %
• Then, what is the complexity for worst and average cases?
• 𝑂(log 𝑛)
12
Binary Search: Complexity
16 74 43 18 13 66 75 12 23 5
• We need a more organized system like Emory Libraries.
• Algorithm 2:
• Sort the array first
5 12 13 16 18 23 43 66 74 75
• Then do a binary search
• Total cost: Cost(Sorting) + Cost(Binary search)
• 𝑇 𝑆𝑜𝑟𝑡𝑖𝑛𝑔 ∈ 𝑂 ? , 𝑇 𝐵𝑖𝑛𝑎𝑟𝑦 𝑆𝑒𝑎𝑟𝑐ℎ ∈ 𝑂 log 𝑛
• If we use a Merge Sorting Algorithm: 𝑂 𝑛 log 𝑛
• 𝑇 𝑆𝑜𝑟𝑡𝑖𝑛𝑔 + 𝑇 𝐵𝑖𝑛𝑎𝑟𝑦 𝑆𝑒𝑎𝑟𝑐ℎ ∈ 𝑂 max 𝑛 log 𝑛 , log 𝑛 = 𝑂 𝑛 log 𝑛
O-Notation: Addition
Let 𝑓$ ∈ 𝑂 𝑔$ and 𝑓# ∈ 𝑂(𝑜# ), then:𝑓$ + 𝑓# ∈ 𝑂(max(𝑔$ , 𝑔# )) 13
Linear Search vs. Binary Search
• Linear Search
• Construction: X
• Search: 𝑂 𝑛
• Binary Search
• Construction: 𝑂 𝑛 log 𝑛
• Search: 𝑂 log 𝑛
• Binary search is slower than the naïve algorithm if the array is not
sorted.
• Once the array is sorted, future searches run in 𝑂 log 𝑛 .
14
Data Structure
The very basic idea of a data structure:
• Make a one-time commitment to prepare the data
• Incur preprocessing cost a.k.a. offline-cost
• Exploit this preparation to improve the query
• Reduce the query time a.k.a. online-time
• The prepared data is called a data structure (or index-structure)
15
Overview of Upcoming Weeks
• We will learn Sorting Algorithms!
ØTo support searching in a list of objects in 𝑂(log 𝑛)
ØWe will learn some naive algorithms (which run in 𝑂(𝑛B))
• Not efficient. But examples for straightforward solutions.
ØThen we will learn some efficient algorithms (in 𝑂(𝑛 log 𝑛 ))
• Fun algorithms!
ØWe will also learn some (at least one) algorithm in 𝑂(𝑛)
• But these algorithms only work in certain cases (assumption on the data)
16
Overview of Upcoming Weeks
• Then we will learn Searching Algorithms!
• We want our data structure to be reuseable – remember?
• But what if the dataset change? If a new object is inserted/deleted,
• Do we need to re-sort from scratch in 𝑂(𝑛 log 𝑛)?
• Or can we somehow update the data structure more efficiently?
ØDynamic data structures such as balanced binary search trees.
17
Section Overview: Sorting
• Basics
• Simple Algorithms
• BubbleSort
• SelectionSort
• InsertionSort
• Efficient Algorithms
• MergeSort
• QuickSort
• HeapSort
• Special Algorithms
• BucketSort
18
Section Overview: Sorting
• Basics
• Problem Definition: Sorting
• Simple Algorithms
• BubbleSort
• SelectionSort
• InsertionSort
• Efficient Algorithms
• MergeSort
• QuickSort
• HeapSort
• Special Algorithms
• BucketSort
19
How to Define the Sorting Problem?
• Input
16 74 43 18 13 66 75 12 23 5
• Designed output
5 12 13 16 18 23 43 66 74 75
20
Total Order
• Let 𝑀 be a non-empty set and let ≤ ⊆ 𝑀×𝑀 be a binary relation on
𝑀
• The pair (𝑀, ≤) is called a total (or linear) order if and only if the
following properties hold:
• Reflexivity: ∀𝑥 ∈ 𝑀: 𝑥≤𝑥
• Transitivity: ∀𝑥, 𝑦, 𝑧 ∈ 𝑀: 𝑥 ≤𝑦∧𝑦 ≤𝑧 ⇒𝑥 ≤𝑧
• Antisymmetry: ∀𝑥, 𝑦 ∈ 𝑀: x≤𝑦∧𝑦 ≤𝑥 ⇒𝑥 =𝑦
• Totality: ∀𝑥, 𝑦 ∈ 𝑀: 𝑥 ≤𝑦∨𝑦 ≤𝑥
21
Total Order: Examples
• The relation ≤ on integers is a total order
22
The Sorting Problem
• Input: A sequence of keys (values)
𝑎A , 𝑎B , 𝑎C , … , 𝑎D
and a total order ≤ on the set of keys {𝑎A , 𝑎B , 𝑎C , … , 𝑎D }
23
The Sorting Problem: Examples
24
Comparing Sorting Algorithms
Sorting algorithms can be compared using the following metrics
• Run-time
• 𝑂 𝑛! , 𝑂 𝑛 log 𝑛 , 𝑂(𝑛)
• Memory usage
• In-place vs making copies
• Stability
• Maintaining the order of objects having the same key.
25
Swaps vs Comparisons
• The choice of an algorithm depends on the application
• An algorithm with many swaps and few comparisons is good in cases where
comparisons are expensive.
Example 1: Sort containers by weight
• Expensive swaps:
(Have to use a crane to move containers around)
• Cheap comparisons:
(Weight of contains in the container documents)
26
Section Overview: Sorting
• Basics
• Simple Algorithms
• BubbleSort
• SelectionSort
• InsertionSort
• Efficient Algorithms
• MergeSort
• QuickSort
• HeapSort
• Special Algorithms
• BucketSort
27
Swap
• In the following we will use the function swap(a, i, j) where a is
an array and i and j are integers.
• This function has the following three operations
Object temp = a[i];
a[i] = a[j];
a[j] = temp;
28
BubbleSort
• Idea:
• From left to right: Compare pairs of adjacent keys
• Swap them if the left key is greater than the right key
• Through iterative swapping the large elements will “rise” to the right like
bubbles in water
public void bubblesort(int[] a){
int n = a.length;
for(int i = 0; i < n; i++) {
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1])
swap(a,j,j+1);
}
}
}
29
BubbleSort: Step-by-Step Example
3 7 8 6 4 2
int n = a.length;
for(int i = 0; i < n; i++) {
i 0
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 0
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 7 8 6 4 2
3<7
int n = a.length;
for(int i = 0; i < n; i++) {
i 0
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 0
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 7 8 6 4 2
7<8
int n = a.length;
for(int i = 0; i < n; i++) {
i 0
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 1
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 7 8 6 4 2
8≮6
int n = a.length;
for(int i = 0; i < n; i++) {
i 0
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 2
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 7 6 8 4 2
swap
int n = a.length;
for(int i = 0; i < n; i++) {
i 0
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 2
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 7 6 8 4 2
8≮4
int n = a.length;
for(int i = 0; i < n; i++) {
i 0
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 3
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 7 6 4 8 2
swap
int n = a.length;
for(int i = 0; i < n; i++) {
i 0
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 3
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 7 6 4 8 2
8≮2
int n = a.length;
for(int i = 0; i < n; i++) {
i 0
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 4
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 7 6 4 2 8
swap
int n = a.length;
for(int i = 0; i < n; i++) {
i 0
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 4
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 7 6 4 2 8
3<7
int n = a.length;
for(int i = 0; i < n; i++) {
i 1
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 0
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 7 6 4 2 8
7≮6
int n = a.length;
for(int i = 0; i < n; i++) {
i 1
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 1
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 6 7 4 2 8
swap
int n = a.length;
for(int i = 0; i < n; i++) {
i 1
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 1
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 6 7 4 2 8
7≮4
int n = a.length;
for(int i = 0; i < n; i++) {
i 1
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 2
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 6 4 7 2 8
swap
int n = a.length;
for(int i = 0; i < n; i++) {
i 1
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 2
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 6 4 7 2 8
7≮2
int n = a.length;
for(int i = 0; i < n; i++) {
i 1
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 3
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 6 4 2 7 8
swap
int n = a.length;
for(int i = 0; i < n; i++) {
i 1
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 3
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 6 4 2 7 8
Don’t need to check!
After the first outer
iteration 8 must be the
largest key.
BubbleSort: Step-by-Step Example
3 6 4 2 7 8
3<6
int n = a.length;
for(int i = 0; i < n; i++) {
i 2
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 0
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 6 4 2 7 8
6≮4
int n = a.length;
for(int i = 0; i < n; i++) {
i 2
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 1
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 4 6 2 7 8
swap
int n = a.length;
for(int i = 0; i < n; i++) {
i 2
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 1
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 4 6 2 7 8
6≮2
int n = a.length;
for(int i = 0; i < n; i++) {
i 2
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 2
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 4 2 6 7 8
swap
int n = a.length;
for(int i = 0; i < n; i++) {
i 2
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 2
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 4 2 6 7 8
Don’t need to check!
After the second outer
iteration 7 must be the
second largest key.
BubbleSort: Step-by-Step Example
3 4 2 6 7 8
3<4
int n = a.length;
for(int i = 0; i < n; i++) {
i 3
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 0
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 4 2 6 7 8
4≮2
int n = a.length;
for(int i = 0; i < n; i++) {
i 3
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 1
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 2 4 6 7 8
swap
int n = a.length;
for(int i = 0; i < n; i++) {
i 3
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 1
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
3 2 4 6 7 8
3≮2
int n = a.length;
for(int i = 0; i < n; i++) {
i 4
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 0
swap(a,j,j+1);
}
}
BubbleSort: Step-by-Step Example
2 3 4 6 7 8
swap
int n = a.length;
for(int i = 0; i < n; i++) {
i 4
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1]) j 0
swap(a,j,j+1);
}
}
BubbleSort Complexity Analysis: Comparisons
• Number of comparisons:
• Independent of any pre-sorting of the array
Ø Worst-case, average-case, best-case are identical.
• In iteration i there are n-i comparisons. Total:
V VXI
𝑛(𝑛 − 1)
?(𝑛 − 𝑖) = ? 𝑖 = ∈ 𝑂(𝑛B)
2
GUI GUW
int n = a.length;
for(int i = 0; i < n; i++) {
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1])
swap(a,j,j+1);
}
}
58
BubbleSort Complexity Analysis: Swaps
• Number of swaps:
• Best Case: 0 Swaps (Array already sorted)
V(VXI)
• Worst Case: B
∈ 𝑂(𝑛B) swaps (sorted in descending order)
V(VXI)
• Average Case: Y ∈ 𝑂(𝑛B) swaps
• Donald E. Knuth: The Art of Computer Programming (1997)
int n = a.length;
for(int i = 0; i < n; i++) {
for(int j = 0; j < n-i-1; j++){
if(a[j] > a[j+1])
swap(a,j,j+1);
}
}
59
Section Overview: Sorting
• Basics
• Simple Algorithms
• BubbleSort
• SelectionSort
• InsertionSort
• Efficient Algorithms
• MergeSort
• QuickSort
• HeapSort
• Special Algorithms
• BucketSort
60
SelectionSort
• Idea:
• Iterate from left to right using a pointer 𝑖
• Elements to the left of 𝑖 are sorted
• Find the smallest element to the left of 𝑖, swap it with 𝑖, then increment 𝑖 by
one
for(int i = 0; i < n-1; i++) {
int min = i;
for(int j = i+1; j < n; j++){
if(a[j] < a[min])
min = j;
}
swap(a,i,min);
}
61
SelectionSort: Step-by-Step Example
3 7 8 6 4 2
84
SelectionSort Complexity Analysis: Swaps
• Number of swaps:
• Best Case: 𝑛 − 1 swaps (trivial swaps between index i and index i)
• Worst Case: 𝑛 − 1 swaps
• Average Case: 𝑛 − 1 swaps
Ø Number of swaps grows linear!
Ø SelectionSort is useful for cases where swaps are expensive.
for(int i = 0; i < n-1; i++) {
int min = i;
for(int j = i+1; j < n; j++){
if(a[j] < a[min])
min = j;
}
swap(a,i,min);
}
85
Section Overview: Sorting
• Basics
• Simple Algorithms
• BubbleSort
• SelectionSort
• InsertionSort
• Efficient Algorithms
• MergeSort
• QuickSort
• HeapSort
• Special Algorithms
• BucketSort
86
InsertionSort
• Idea:
• Iterate from left to right using a pointer 𝑖
• Keep elements to the left of 𝑖 sorted
• In each iteration, move the object at position 𝑖 + 1 to the correct position to
the left of 𝑖
for(int i = 1; i < n; i++) {
int key = a[i];
int j = i;
while(j > 0 && a[j-1] > key){
a[j] = a[j-1];
j--;
}
a[j] = key;
}
87
InsertionSort: Step-by-Step Example
3 7 8 6 4 2
109
InsertionSort Complexity Analysis: Edits
• Number of edits:
• Best Case: The input array is already sorted: No shifts needed
• Worst Case: Input array is sorted descendingly. The 𝑗th element will be
shifted to the beginning of the array in 𝑗 − 1 edit operations
V(VXI)
• It can be shown that the average case requires Y
∈ 𝑂 𝑛B operations
for(int i = 1; i < n; i++) {
int key = a[i];
int j = i;
while(j > 0 && a[j-1] > key){
a[j] = a[j-1];
j--;
}
a[j] = key;
}
110
Summary: Simple Sorting Algorithms
• BubbleSort
• Simple implementation
• Many comparisons even in the best case
• SelectionSort
• Requires fewer swaps.
• Good in cases where swaps are costly
• InsertionSort
• More swaps but fewer comparisons
• Good in cases where the array grows dynamically during execution (data streaming)
• So far: All algorithms take 𝑂(𝑛B) comparisons or swaps/edits.
• Can we do better?
111