[go: up one dir, main page]

0% found this document useful (0 votes)
42 views97 pages

CH7 Sorting

This document outlines sorting algorithms covered in Chapter 7. It begins with an introduction to sorting and important uses of sorting like searching lists and matching two lists. It then covers insertion sort in detail, explaining the basic insertion step and providing pseudocode for insertion sort. It analyzes the time complexity of insertion sort and variations like binary insertion sort. Finally, it briefly mentions other sorting algorithms that will be covered like quicksort, merge sort, heap sort, and radix sort.

Uploaded by

張思思
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views97 pages

CH7 Sorting

This document outlines sorting algorithms covered in Chapter 7. It begins with an introduction to sorting and important uses of sorting like searching lists and matching two lists. It then covers insertion sort in detail, explaining the basic insertion step and providing pseudocode for insertion sort. It analyzes the time complexity of insertion sort and variations like binary insertion sort. Finally, it briefly mentions other sorting algorithms that will be covered like quicksort, merge sort, heap sort, and radix sort.

Uploaded by

張思思
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 97

Data

Structures
CH7 Sorting
Prof. Tai-Lang Jong
Office: Delta 928
Tel: 42577
email: tljong@mx.nthu.edu.tw
Spring 2022
Outline
• 7.1 Introduction
• 7.2 Insertion Sort
• 7.3 Quick Sort
• 7.4 How fast can we sort?
• 7.5 Merge sort
• 7.6 Heap sort
• 7.7 Radix sort
• 7.8 (List and table sorts)
• 7.9 Summary of internal sorting
2
Important Uses of Sorting
• Searching a List of records – a common operation
• Each record has 1 or more fields
• Record uses key fields to distinguish among records
• Search for a record with the specified key

• Sorting aids searching in a list


• O(n) time for an unordered list (with sequential search)
• O(log(n)) time for a sorted list (with binary search)

• Sorting aids matching two lists


• O(nm) time for unordered lists (with sequential search)
• O(n+m) time for sorted lists 3
Sequential Search
template <class E, class K>
int SeqSearch(K *a, const int n, const K& k)
{ // search a[1:n] from left to right. Return least i
// such that the key of a[i] = k, return 0 if not
int i;
for (i = 1; i <= n && a[i] != k; i++);
if (i> n) return 0; // not found
return i; // found
}

• If all keys are distinct and the key a[i] is being searched
for, then i key comparisons are made
• The average number of comparisons for a successful
search is ሺσ 𝑛𝑖=1ሺ𝑖ሻሻ
= (𝑛 + 1)/2
𝑛 4
Binary Search
template <class E, class K>
int BinarySearch(K *a, const int n, const K& k)
{ // search sorted a[1:n] for k
int left = 1, right = n;
while(left <= right){
int middle = (left + right)/2;
if(k < a[middle]) right = middle - 1;
else if(k > a[middle]) left = middle + 1;
else return middle; // found
}
return -1; // not found
}

• The number of comparisons for unsuccessful search


is 𝑙𝑜𝑔‫ڿ‬2 𝑛 ‫ۀ‬
5
Verifying Two Lists
• Comparing two lists of records containing data that are
essentially the same but have been obtained from two
different sources, e.g., employers filing their pays to
employees (list l1) and employees filing their individual
pays received (list l2).
• Required verification:
1. If there is no record in the employee list (l2)
corresponding to a key in the employer list (l1), a
message is to be sent to the employee.
2. If the reverse is true, then a message is to be sent to the
employer.
3. If there is a discrepancy between two records with the
same key, a message to this effect is to be output.
6
Verifying Two Lists
• Assume l1 is the employer list which contains n
records and l2, employee list, m records
• l1, l2 unsorted  Verify using sequential search
• O(mn)
• Sort l1, l2 first then verify  Fast verify
• O(tsort(n) + tsort(m) + n+m)

7
Verifying 2 Lists w Sequential Search
void Verify1 (Element *l1, Element *l2, const int n, const int m)
{ // 比對兩個大小分別為 n 跟 m 的未排序過的串列 l1 與 l2 ,。
bool *marked = new bool [m + 1];
fill (marked + 1, marked + m + 1, false);
for (i = 1 ; i <= n ; i++)
{
int j = SeqSearch (l2, m, l1[i]);
if (j == 0) cout << l1[i] << “ not in l2 “ << endl; // 滿足 (1)
else
{
if (!Compare (l1[i], l2[j]) // 滿足 (3)
cout << “Discrepancy in “ << l1[i] << endl;
marked [j] = true; // 將 l2[j] 標為已檢查
}
}
for (i = 1; i <= m ; i ++)
if(!marked [i])cout << l2[i] << “ not in l1. “ << endl;   // 滿足 (2)
delete [] marked;
}
8
Fast Verifying 2 Lists
void Verify2 (Element *l1, Element *l2, const int n, const int m)
{ // 跟 Verify1 的工作一樣。不過,這次我們先對 l1 、 l2 排序過。
Sort(l1, n); // 依鍵值的遞增順序做排序
Sort(l2, m);
int i = 1, j = 1;
while (( i <= n) && ( j <= m)) {
if ( l1[i] < l2[j]) { cout << l1[i] << “ not in l2” << endl; i++; }
else if (l1[i] > l2[j]) { cout << l2[j] << “not in l1” << endl; j++; }
else // 鍵值相等
{
if (!Compare (l1[i], l2[j])) cout << “Discrepancy in” << l1[i] << endl;
i++; j++;
}
}
if (i <= n) OutputRest(l1, i, n, 1); // 輸出 l1 從 i 到 n 的記錄
else if (j <= m) OutputRest(l2, j, m, 2); // 輸出 l2 從 j 到 m 的記錄
}
9
Classification of Sorting
• Complexity Score Name Score Name
• Time complexity 100 Alice 100 Alice
• Space complexity 90 Bob 100 David
100 David 90 Emily
• Stability 90 Emily 90 Bob
• A sort is called stable iff it
maintains the relative order non-stable sort example
of records with equal keys
• Internal vs. external
• An internal sort requires its inputs to be small enough so
that the entire sort can be carried out in main memory
• Examples: Selection Sort, Insertion Sort, Quick Sort, Heap Sort
• An external sort has no abovementioned requirement
10
Internal Sort
• Bubble sort – simple but slow
• Priority queue sort
• Sequence
• Insertion sort
• Selection sort
• Heap
• Heap sort
• Quick sort – divide-and-conquer
• Merge sort –divide-and-conquer, set ADT
• Bucket sort
• Radix sort
11
Outline
• 7.1 Introduction
• 7.2 Insertion Sort
• 7.3 Quick Sort
• 7.4 How fast can we sort?
• 7.5 Merge sort
• 7.6 Heap sort
• 7.7 Radix sort
• 7.8 (List and table sorts)
• 7.9 Summary of internal sorting
12
Insertion Sort
• The basic step in insertion sort is to insert a new
record into a sorted sequence of i records in such a
way that the resulting sequence of size (i + 1) is also
ordered (sorted)
Priority Queue
a[ ] sorted
1 i

a[ ] sorted
1 i+1

13
Insertion Sort Concept0 1 2 3 4 5
5 3 4 1 2
• array[0] is used as temporary
3 5 4 1 2
space
• array[1] is the initial sorted 3 5 4 1 2
sublist 4 3 5 1 2
• Insertion pass
• Place the element next to the 3 4 5 1 2
sublist to the temporary space
1 3 4 5 2
• Insert the element to the sublist
• Insertion passes are continued 1 3 4 5 2
until the sublist contains all 2 1 3 4 5
records
• Insertion Sort is a stable sort 1 2 3 4 5
14
Insertion Sort Algorithm
template <class T>
void InsertionSort(T *a, const int n)
{ // sort a[1:n] into nondecreasing order
for (int j = 2; j <= n ; j++){
a[0] = a[j];
for (int i = j - 1; a[i] > a[0]; i--) Comparison
{
a[i + 1] = a [i]; Data move
}
a[i + 1] = a[0]; Data move
}
}

• Need one extra space a[0]

15
How Fast is Insertion Sort?
5 3 4 1 2
• Worst-case time complexity 3 5 4 1 2
• When input is in a reversed order
• Each insertion pass involves i 3 5 4 1 2
comparisons, i = 1..n
• 1+2+…+n = O(n2) 4 3 5 1 2

3 4 5 1 2
• Average time complexity 1 3 4 5 2
• It has been shown that Insertion
Sort is O(n2) on average 1 3 4 5 2

2 1 3 4 5

1 2 3 4 5
16
Insertion into a Sorted List
template <class T>
void Insert(const T& e, T* a, const int i)
{ // insert e into ordered a[1:i]
a[0] = e;
while(e < a[i])
{
a[i + 1] = a[i];
i--;
}
a[i + 1] = e;
}

• Insert() makes i+1 comparisons so complexity is O(i)

17
Insertion Sort Algorithm
template <class T>
void InsertionSort(T* a, const int n)
{ // Sort a[1:n] into nondecreasing order
for(int j = 2; j<=n;j++)
{
T temp = a[j];
insert(temp, a, j-1);
}
}

• InsertionSort() invokes Insert() for i = j-1 = 1,2,…,n-


1, so the complexity of InsertionSort is
𝒏−𝟏

𝑶 ቌ෍ ሺ𝒊 + 𝟏ሻቍ= 𝑶(𝒏𝟐 )
𝒊=𝟏
18
Relative Disorder
• Left out of order (LOO):
Record Ri is LOO iff Ri < max1≤j<i {Rj}
• The insertion step has to be carried out only for
those records that are LOO.
• If k is the number of LOO records, the computing
time is O((k+1)n) = O(kn)
• The computing time is O(kn) makes InsertionSort
very desirable in sorting sequences in which only a
very few records are LOO (i.e., k < n).
• The simplicity of this scheme makes it about the
fastest sorting method for small n (say n ≤ 30)
19
Variations of Insertion Sort
• Binary Insertion Sort
• Use binary search rather than sequential search for
insertion passes to reduce the number of comparisons
• The number of record moves remains unchanged

• Linked Insertion Sort


• The records to be sorted are stored in a linked list rather
than an array
• The number of record moves becomes zero because
only the link fields require adjustment
• Complexity does not change because sequential search
is required for insertion
20
Selection Sort
a[1] a[n-1] a[n]

• n <= 1  already sorted. So, assume n > 1.


• Select the largest element and Move it to the right
end of the list (swap).
• Recursively sort the remaining n-1 elements a[1:n-1].
• Complexity is O(n2).
• Usually implemented nonrecursively.
21
Outline
• 7.1 Introduction
• 7.2 Insertion Sort
• 7.3 Quick Sort
• 7.4 How fast can we sort?
• 7.5 Merge sort
• 7.6 Heap sort
• 7.7 Radix sort
• 7.8 (List and table sorts)
• 7.9 Summary of internal sorting
22
Divide-and-Conquer
• Divide-and conquer is a general algorithm design
paradigm:
• Divide: if the input size is smaller than a certain
threshold (one or two elements), solve the problem
directly (base case). Otherwise, divide the input
data S in two disjoint subsets S1 and S2
• Recur: recursively solve the subproblems
associated with S1 and S2
• Conquer: combine (“merge”) the solutions for S1
and S2 into a solution for S

23
Quick Sort
• Developed by C. A. R. Hoare
• Very good average behavior, actually the best
average behavior among the sorting methods:
insertion sort, selection sort, heap sort, merge sort
• Is a randomized sorting algorithm based on the
divide-and-conquer paradigm
• Divide: pick a random element x (called pivot) and
partition S into 7 4 9 6 2  2 4 6 7 9
• L elements less than x
• E elements equal x 4 2  2 4 7 9  7 9
• G elements greater than x
• Recur: sort L and G 22 99
• Conquer: join L, E and G
24
1 2 3 4 5 6 7 8
Quick Sort Concept 6 3 7 1 8 4 5 2

4 3 2 1 5 6 8 7
• Divide-and-conquer
^ sublist
pivot
sublist ()
• Division pass
• Pick the first element as the 4 3 2 1 5 6 8 7
pivot
• Make elements whose key ≤ 1 3 2 4 5 6 7 8
pivot the left sublist ^ ^ ^
• Make elements whose key >
pivot the right sublist 1 3 2 4 5 6 7 8

• Division passes are continued 1 3 2 4 5 6 7 8


until all sublists are of size ≤ 1. ^ ^ ^ ^

• Baically, quick sort is 1 3 2 4 5 6 7 8


nonstable
1 2 3 4 5 6 7 8
^ ^ ^ ^ ^
Choice of Pivot
• Pivot is the leftmost element in list that is to be
sorted.
 When sorting a[6:20], use a[6] as the pivot.
 Textbook implementation does this.
• Randomly select one of the elements to be sorted
as the pivot.
 When sorting a[6:20], generate a random number r in
the range [6, 20]. Use a[r] as the pivot.
 Median-of-Three rule. From the leftmost, middle,
and rightmost elements of the list to be sorted,
select the one with median key as the pivot.
26
Quick Sort Concept 1 2 3 4 5 6 7 8
6 3 7 1 8 4 5 2
• Steps for generating sublists
find a key pivot find a key pivot
• Place pivot at a[1] (swap swap
with a[1] if random pivot a[r]
or median-of-three pivot is 6 3 2 1 8 4 5 7
used)
find swap find
• Linear searching from the
both ends 6 3 2 1 5 4 8 7
• Find candidates to swap
find find
• Perform swapping
last swap place the pivot
• In-place partition and sorting between the two sublists
• One can use an additional 4 3 2 1 5 6 8 7
array for partition
^ sublist
pivot
sublist
27
Quick Sort Algorithm
template <class T>
void QuickSort(T *a, const int left, const int right)
{ // sort a[left:right] into nondecreasing, a[left] as pivot
if (left < right) {
T & pivot = a[left];
int i = left;
int j = right + 1;
do {
do j--; while (a[j] > pivot); //find a key ≤pivot
do i++; while (a[i] <= pivot); //find a key >pivot

if (i < j) swap(a[i], a[j]);


} while (i < j);
swap(pivot, a[j]); //place the pivot between 2 lists
QuickSort(a, left, j - 1); // recursion
QuickSort(a, j + 1, right); // recursion
}
}
28
In-Place Partitioning Example
a 6 2 8 5 11 10 4 1 9 7 3

a 6 2 3 5 11 10 4 1 9 7 8

a 6 2 3 5 1 10 4 11 9 7 8

a 6 2 3 5 1 4 10
10 11 9 7 8

a 4 2 3 5 1 6
4 10 11 9 7 8
29
Quick Sort Example

R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 left right


[26 5 37 1 61 11 59 15 48 19] 1 10
[11 5 19 1 15] 26 [59 61 48 37] 1 5
[1 5] 11 [19 15] 26 [59 61 48 37 1 2
1 5 11 [19 15] 26 [59 61 48 37] 4 5
1 5 11 15 19 26 [59 61 48 37] 7 10
1 5 11 15 19 26 [48 37] 59 [61] 7 8
1 5 11 15 19 26 37 48 59 [61] 10 10
1 5 11 15 19 26 37 48 59 61

30
Space Needed in Recursive Quick Sort
• Unlike Insertion Sort, where the only additional
space needed was for one record, Quick Sort needs
stack space to implement the recursion.
• If the lists split evenly each time, the maximum
recursion depth would be logn, requiring a stack
space of O(logn).
• The worst case occurs when the list is split into a
left sublist of size n-1 and a right sublist of size 0 at
each level of recursion. In this case the depth of
recursion becomes n, requiring stack space of O(n).

31
How Fast is Quick Sort ?
• Worst case time complexity
• Each division pass involves n comparisons and ends up
with sublist with 0 and n-1 records (input already sorted)
• T(n) = O(n) + T(n-1) = O(n2)
• Average case time complexity
• It has been shown that quick sort is O(nlog(n)) on
average 𝑛
𝑇ሺ𝑛ሻ ≤ 𝑐𝑛 + 2𝑇 ቀ ቁ, 𝑓𝑜𝑟 𝑠𝑜𝑚𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑐
2
𝑐𝑛 𝑛
≤ 𝑐𝑛 + 2 ቆ + 2𝑇 ቀ ቁቇ
2 4
𝑛
≤ 2𝑐𝑛 + 4𝑇 ቀ ቁ
4
.
≤ 𝑐𝑛𝑙𝑜𝑔2 𝑛 + 𝑛𝑇ሺ1ሻ = 𝑂(𝑛𝑙𝑜𝑔𝑛)
32
Variations of Quick Sort
• Median-of-three strategy
• Ordered lists are worst-case inputs for Quick Sort
• Pivot are always the smallest or largest key within a
sublist
• Ordered lists are not rare in real life
• Choosing the pivot using the median of the first, middle,
and last key can address this issue
pivot = median{ Kl, K(l+r)/2, Kr}
• Random pivot strategy

33
Non-In-Place Partition
• We partition (split) an input sequence S as follows:
• We remove, in turn, each element y from S and
• We insert y into L, E or G, depending on the result of the
comparison with the pivot x
• Each insertion and removal is at the beginning or at
the end of a sequence, and hence takes O(1) time
• Thus, the partition step of quick-sort takes O(n) time

x x x
L E G
34
Outline
• 7.1 Introduction
• 7.2 Insertion Sort
• 7.3 Quick Sort
• 7.4 How fast can we sort?
• 7.5 Merge sort
• 7.6 Heap sort
• 7.7 Radix sort
• 7.8 (List and table sorts)
• 7.9 Summary of internal sorting
35
7.4 How Fast Can We Sort?
• A sorting algorithm can be represented as a binary
decision tree
• Non-leaf node represents a comparison between two keys
• Leaf nodes are the sorting results

[a, b, c]

Y
a b N
b c a c
Y N Y N
[a, b, [b, a,
c]
ac c]
b c
Y N Y N
[a, c, [c, a, [b, c, [c, b,
b] b] a] a]
36
How Fast Can We Sort?
• Sorting n records:
• Number of leaf nodes is at least n!
• Tree height is at least (log(n!) + 1) = W(nlog(n))
→ sorting with worst time complexity < nlog(n) is impossible
• Average root-to-leaf path length is W(nlog(n))
→ sorting with worst time complexity < nlog(n) is impossible
[a, b, c]

Y
a b N
b c a c
Y N Y N
[a, b, [b, a,
c]
ac c]
b c
Y N Y N
[a, c, [c, a, [b, c, [c, b,
b] b] a] a] 37
C++ STL sort Function
• Quick sort.
 Switch to heap sort when number of subdivisions
exceeds some constant times log2n.
 Switch to insertion sort when segment size becomes
small.

• Quick Sort Complexity:


• To improve performance, stop recursion when segment
size is <= 15 (say) and sort these small segments using
insertion sort.

38
Outline
• 7.1 Introduction
• 7.2 Insertion Sort
• 7.3 Quick Sort
• 7.4 How fast can we sort?
• 7.5 Merge sort
• 7.6 Heap sort
• 7.7 Radix sort
• 7.8 (List and table sorts)
• 7.9 Summary of internal sorting
39
7.5 Merge Sort
1 2 3 4 5 6
• Interpret the unsorted list 5 2 6 1 4 3
as n sorted sublists, each of interpreted as
size 1

=
n sublists

• These lists are merged by 5 2 6 1 4 3


merge
pairs in each pass by pairs
• Merge passes are 2 5 1 6 3 4
merge
continued until there is by pairs
only one sublist remained 1 2 5 6 3 4
merge
• Merge Sort is stable by pairs
1 2 3 4 5 6

40
Merge Sort Algorithm (Nonrecursive)
template <class T>
void MergeSort(T *a, const int n)
{ // sort a[1:n] into non-decreasing order
T *tempList = new T[n+1];

// s is the length of the currently merged sublist


for (int s = 1; s < n; s *= 2)
{
MergePass(a, tempList, n, s);

s *= 2;
MergePass(tempList, a, n, s);
}
delete [] tempList;
}
(to be continued) 41
Merge Sort Algorithm (Nonrecursive)
template <class T>
void MergePass(T *a, T *b, const int n, const int s)
{//adjacent pairs of sublist of size s are merged from a to b. n = # records in a
for (int i = 1; //i the first position in first of the sublists merged

i <= n-(2*s)+1; //enough elements for 2 sublists of length


s?
i+ = 2*s) {
Merge(a, b, i, i+s-1, i+(2*s)-1);
}
// merge remaining lists of size < 2*s
if ((i + s-1) < n) //one full and one partial lists
Merge(a, b, i, i+s-1, n);
else //only one partial lists remained
copy(a+i, b+n+1, b+i);
(to
} be continued)
42
Merge Sort Algorithm (Nonrecursive)
template <class T>
void Merge(T *a, T *b, const int k, const int m, const int n)
{ // a[k:m], a[m+1:n] are sorted, merged to b[k:n]
i1 i2
for (int i1 = k, i2 = m+1, i3 = k;
a
i1 <= m && i2 <= n; i3++) {
if (a[i1] <= a [i2]) { merge
b [i3] = a [i1]; i3
i1++; b
} else {
b [i3] = a [i2];
i2++;
}
}
// copy remaining records, if any, of 1st sublist
if(i2>n) copy (a+i1, a+m+1, b+i3);
// copy remaining records, if any, of 2nd sublist
if(i1>m)copy (a+i2, a+n+1, b+i3);
}
43
Nonrecursive Merge Sort
Input list is (26, 5, 77, 1, 61, 11, 59, 15, 48, 19). 10 keys.
Merge Tree
26 5 77 1 61 11 59 15 48 19

5 26 5 77 11 61 15 59 19 48

1 5 26 77 11 15 59 61 19 48

1 5 11 15 26 59 61 77 19 48

1 5 11 15 19 26 48 59 61 77
Nonrecursive Merge Sort
Input list has 13 keys.
[8] [3] [13] [6] [2] [14] [5] [9] [10] [1] [7] [12] [4]

[3, 8] [6, 13] [2, 14] [5, 9] [1, 10] [7, 12] [4]

[3, 6, 8, 13] [2, 5, 9, 14] [1, 7, 10, 12] [4]

[2, 3, 5, 6, 8, 9, 13, 14] [1, 4, 7, 10, 12]

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14]


Complexity
• Sorted segment size is 1, 2, 4, 8, …
• Number of merge passes is .
• Each merge pass takes O(n) time.
• Total time is O(nlog n).
• Need O(n) additional space for the merge.
• Merge sort is slower than insertion sort when n <=
15 (approximately). So define a small instance to be
an instance with n <= 15.
• Sort small instances using insertion sort.
• Start with segment size = 15.
46
Merge Two Sorted Lists
• A = (2, 5, 6)
B = (1, 3, 8, 9, 10)
C = ()
• Compare smallest elements of A and B and merge
smaller into C.
• A = (2, 5, 6)
B = (3, 8, 9, 10)
C = (1)

47
Merge Two Sorted Lists
• A = (5, 6)
B = (3, 8, 9, 10)
C = (1, 2)
• A = (5, 6)
B = (8, 9, 10)
C = (1, 2, 3)
• A = (6)
B = (8, 9, 10)
C = (1, 2, 3, 5)
48
Merge Two Sorted Lists
• A = ()
B = (8, 9, 10)
C = (1, 2, 3, 5, 6)
• When one of A and B becomes empty, append the
other list to C.
• A = ()
B = ()
C = (1, 2, 3, 5, 6, 8, 9, 10)
• O(1) time needed to move an element into C.
• Total time is O(n + m), where n and m are, respectively,
the number of elements initially in A and B.
49
Recursive Merge Sort
• Divide & Conquer
• Divide :
• Partition the n > 1 elements into two smaller instances.
• First elements define one of the smaller instances;
remaining elements define the second smaller instance.
• Recur:
• Each of the two smaller instances is sorted recursively.
• Conquer:
• The sorted smaller instances are combined using a
process called merge.
• Complexity is O(n log n).
50
Recursive Merge Sort
template <class T>
int rMergeSort(T* a, int* link, const int left, const int
right)
{// 要排序的是 a[left:right] 。對於所有 i , link[i] 初始化為
0。
// rMerge 回傳排序好的鏈的第一個元素之索引值。
if (left >= right) return left; //base case
int mid = (left + right) /2;
return ListMerge(a, link,
rMergeSort(a, link, left, mid), // 排序左半邊
rMergeSort(a, link, mid + 1, right));// 排序右半

}
51
Recursive Merge Sort
Input list is (26, 5, 77, 1, 61, 11, 59, 15, 48, 19). 10 keys.
Sublist partitioning for recursive merge sort

26 5 77 1 61 11 59 15 48 19

5 26 11 59

5 26 77 1 61 11 15 59 19 48

1 5 26 61 77 11 15 19 48 59

1 5 11 15 19 26 48 59 61 77

left ⌊(𝑙𝑒𝑓𝑡 + 𝑟𝑖𝑔h𝑡)/2 ⌋ right


52
Recursive Merge Sort
• Partition

[8, 3, 13, 6, 2, 14, 5, 9, 10, 1, 7, 12, 4]

[8, 3, 13, 6, 2, 14, 5] [9, 10, 1, 7, 12, 4]

[8, 3, 13, 6] [2, 14, 5] [9, 10, 1] [7, 12, 4]

[8, 3] [13, 6] [2, 14] [5] [9, 10] [1] [7, 12] [4]

[8] [3] [13] [6] [2] [14] [9] [10] [7] [12]
Recursive Merge Sort
• To eliminate the record copying that takes pace when
Merge() is used to merge sorted sublists, associate an
integer pointer with each record.
• For this purpose, employ an integer array link[1:n] such
that link[i] gives the record that follows record i in the
sorted sublist.
• In case link[i] = 0, there is no next record.
• Record copying is replaced by link changes and the run
time of our sort function becomes independent of the
size s of a record.
• Iterative merge sort: O(snlogn) time, O(sn) extra space
• Recursive merge sort: O(nlogn) time, O(n) extra space
54
Recursive Merge Sort
tamplate <class T>
int ListMerge(T* a, int* link, const int start1, const int start2)
{// 兩個排序好的鏈分別從 start1 及 start2 開始,將它們合併
// 將 link[0] 當作一個暫時的標頭。回傳合併好的鏈的開頭。
int iResult = 0; // 結果鏈的最後一筆記錄
for (int i1 = start1, i2 =start2; i1 && i2; )
if (a[i1] <= a[i2]) {
link[iResult] = i1;
iResult = i1; i1 = link[i1];}
else {
link[iResult] = i2;
iResult = i2; i2 =link[i2]; }
// 將其餘的記錄附接至結果鏈
if (i1 = = 0) link[iResult] = i2;
else link[iResult] = i1;
return link[0];
}
55
Recursive Merge Sort
• Merge

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13,14]

[2, 3, 5, 6, 8, 13, 14] [1, 4, 7, 9, 10,12]

[3, 6, 8, 13] [2, 5, 14] [1, 9, 10] [4, 7, 12]

[3, 8] [6, 13] [2, 14] [5] [9, 10] [1] [7, 12] [4]

[8] [3] [13] [6] [2] [14] [9] [10] [7] [12]
How Fast is Merge Sort ?
• Both worst and average cases
• log(n) merge passes are performed
• Each merge pass is O(n)
• Time complexity is O(nlog(n))

57
Variations of Merge Sort
• Natural Merge Sort
• Interpreting the initial list as multiple sorted sublists,
each can contain more than one records

1 2 3 4 5 6 1 2 3 4 5 6
5 2 6 1 4 3 5 2 6 1 4 3
=

=
5 2 6 1 4 3 5 2 6 1 4 3

Original Implementation Natural Merge Sort

58
C++ STL stable_sort Function
• Merge sort is stable (relative order of elements
with equal keys is not changed).
• Quick sort is not stable.
• STL’s stable_sort is a merge sort that switches to
insertion sort when segment size is small

59
Outline
• 7.1 Introduction
• 7.2 Insertion Sort
• 7.3 Quick Sort
• 7.4 How fast can we sort?
• 7.5 Merge sort
• 7.6 Heap sort
• 7.7 Radix sort
• 7.8 (List and table sorts)
• 7.9 Summary of internal sorting
60
7.6 Heap Sort Concept
1 2 3 4 5 6 7 8
6 3 8 1 5 4 2 7
• Interpret the input list as a tree interpreted as

=
1
6 a tree
• Heapify the tree to form a max 2 3
3 8
heap 4 5 6 7
1 5 4 2
• Popping pass 8
• Pop the top (maximum) record 7 heapify the tree to
1
• Heap size shrinks by one 8 form a max heap
• Space next to the heap becomes
2 3
7 6
unused 4 5 6 7
• Place the popped record at the 3 5 4 2
8
space 1
• Popping passes are continued 1 2 3 4 5 6 7 8
until the heap becomes empty 8 7 6 3 5 4 2 1
convert the heap
• Heap Sort is non-stable back to a list
1 2 3 4 5 6 761 8
Heap Sort Detail Steps

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
6 3 8 1 5 4 2 7 6 3 8 7 5 4 2 1
interpreted as
=

1 a tree 1
6 6
2 3 2 3
3 8 3 8
4 5 6 7 4 5 6 7
1 5 4 2 7 5 4 2
8 8 heapify
7 1
heapify

62
Heap Sort Detail Steps

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
6 3 8 7 5 4 2 1 6 7 8 3 5 4 2 1

1 1
6 6
2 3 2 3
3 8 7 8
4 5 6 7 4 5 6 7
7 5 4 2 3 5 4 2
8 8
1 1
heapify heapify

63
Heap Sort Detail Steps
heap ~ heap

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
8 7 6 3 5 4 2 1 1 7 6 3 5 4 2 8

1 1
8 1
2 3 2 3
7 6 7 6
4 5 6 7 4 5 6 7
3 5 4 2 3 5 4 2
8 8 heapify
1 8 (trickling
down)

64
Heap Sort Detail Steps
heap ~ heap

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
7 5 6 3 1 4 2 8 2 5 6 3 1 4 7 8

1 1
7 2
2 3 2 3
5 6 5 6
4 5 6 7 4 5 6 7
3 1 4 2 3 1 4 7
8 8 heapify
8 8 (trickling
down)

65
Heap Sort Detail Steps
heap ~heap

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
6 5 4 3 1 2 7 8 2 5 4 3 1 6 7 8

1 1
6 2
2 3 2 3
5 4 5 4
4 5 6 7 4 5 6 7
3 1 2 7 3 1 6 7
8 8
8 8

66
Heap Sort Algorithm
template <class T>
void HeapSort(T *a, const int n)
{// sort a[1..n] into non-decreasing order
// a[n/2] is the parent of the last node, a[n]
for (int i = n/2; i >= 1; i--) // buttom-up heapification
Adjust(a, i, n);// make the tree rooted at i be a max heap

for (int i = n-1; i >= 1; i--) {


swap(a[1], a[i+1]); // move one record from heap to list
Adjust(a, 1, i); // heapify
}
}

(to be continued)
67
Heap Sort Algorithm
template <class T>
void Adjust(T *a, const int root, const int n)
{
// two subtrees are max heaps already
// same procedure as the trickling-down procedure
T e = a[root];
//2*root is root's left child
for (int j = 2*root; j <= n; j *=2) {
if (j < n && a[j] < a[j+1]) // j and j+1 are siblings
j++; // make j be the max child
if (e >= a[j])
break;
a[j / 2] = a[j]; // move jth record up the path
}
a[j / 2] = e;
}
68
How Fast is Heap Sort?
• Both worst and average cases
• Heapifying the tree
• n/2 adjust()’s are invoked, each is at most O(log(n))
• Converting the max heap to the list
• n pop()’s are invoked, each is at most O(log(n))
• Overall, the time complexity is O(nlogn)

69
Summary
Worst Average

• Fastest method when n is small (e.g.,


Insertion n2 n2 n<100)
Sort • O(1) space
• Stable

• Fastest method in practice


• Require O(n2) time in the worst case
Quick Sort n2 n∙logn Require O(log(n)) space

• Non-stable

Merge Sort • Require additional O(n) space


• Stable

Heap Sort • Require additional O(1) space


• Non-stable

70
Summary
5 Insertion Sort

3
Time

2
Heap Sort
Merge Sort
1

Quick Sort
0
0 500 1000 2000 3000 4000 5000
Number of elements to be sorted
71
Outline
• 7.1 Introduction
• 7.2 Insertion Sort
• 7.3 Quick Sort
• 7.4 How fast can we sort?
• 7.5 Merge sort
• 7.6 Heap sort
• 7.7 Radix sort
• 7.8 (List and table sorts)
• 7.9 Summary of internal sorting
72
Bucket Sort (Bin Sort)
• Let be S be a sequence of n (key, element) items
with keys in the range [0, N - 1]
• Bucket-sort uses the keys as indices into an auxiliary
array B of sequences (buckets or bins)
Phase 1: Empty sequence S by moving each item (k, o)
into its bucket B[k]
Phase 2: For i = 0, …, N - 1, move the items of bucket
B[i] to the end of sequence S
• Analysis:
• Phase 1 takes O(n) time
• Phase 2 takes O(n + N) time
Bucket-sort takes O(n + N) time
73
Algorithm bucketSort(S, N)
Input sequence S of (key, element) items with
keys in the range [0, N - 1]
Output sequence S sorted by increasing keys

B  array of N empty sequences


while !S.isEmpty()
f  S.first()
(k, o)  S.remove(f)
B[k].insertLast((k, o))
for i  0 to N - 1
while !B[i].isEmpty()
f  B[i].first()
(k, o)  B[i].remove(f)
S.insertLast((k, o)) //nondecreasing
order 74
Example
• Key range [0, 9]

7, d 1, c 3, a 7, g 3, b 7, e
Phase 1

1, c 3, a 3, b 7, d 7, g 7, e

B       
0 1 2 3 4 5 6 7 8 9
Phase 2

1, c 3, a 3, b 7, d 7, g 7, e
75
Bucket Sort Properties and Extensions
• Key-type Property
• The keys are used as indices into an array and cannot be arbitrary
objects
• No external comparator
• Stable Sort Property
• The relative order of any two items with the same key is preserved
after the execution of the algorithm
• Extensions
• Integer keys in the range [a, b]
• Put item (k, o) into bucket B[k - a]
• String keys from a set D of possible strings, where D has constant
size (e.g., names of the 50 U.S. states)
• Sort D and compute the rank r(k) of each string k of D in the sorted
sequence
• Put item (k, o) into bucket B[r(k)]
76
Lexicographic Order
• A d-tuple is a sequence of d keys (k1, k2, …, kd), where
key ki is said to be the i-th dimension of the tuple
• Example:
• The Cartesian coordinates of a point in space are a 3-tuple
• The lexicographic order of two d-tuples is recursively
defined as follows
(x1, x2, …, xd) < (y1, y2, …, yd)

x1 < y1  [x1 = y1  (x2, …, xd) < (y2, …, yd)]
i.e., the tuples are compared by the first dimension,
then by the second dimension, …, etc.
77
Lexicographic-Sort
• Let Ci be the comparator that compares two tuples
by their i-th dimension
• Let stableSort(S, C) be a stable sorting algorithm
that uses comparator C
• Lexicographic-sort sorts a sequence of d-tuples in
lexicographic order by executing d times algorithm
stableSort, one per dimension
• Lexicographic-sort runs in O(dT(n)) time, where
T(n) is the running time of stableSort

78
Algorithm lexicographicSort(S)
Input sequence S of d-tuples
Output sequence S sorted in
lexicographic order

for i  d downto 1
stableSort(S, Ci)

Example:
(7, 4, 6) (5, 1, 5) (2, 4, 6) (2, 1, 4) (3, 2,
4)
(2, 1, 4) (3, 2, 4) (5, 1, 5) (7, 4, 6) (2, 4,
6)
(2, 1, 4) (5, 1, 5) (3, 2, 4) (7,4, 6) (2, 4, 6)
79
(2, 1, 4) (2, 4, 6) (3, 2, 4) (5, 1, 5) (7, 4,
Basics of Sorting on Several
Keys
• Terminology
• Keys: K1, K2, ..., Kr
• K1 is the most significant key
• Kr is the least significant key
• Comparison of multiple key (lexicographic order)
• The r-tuple (x1, x2, ..., xr) is less than or equal to the r-
tuple (y1, y2, ..., yr) iff either one of the following two
conditions is satisfied
(1) xi= yi for 1≦i≦r, (=)
(2) xi= yi for 1≦i < , and x < y for some 1≦≦ r (<)

E.g., (1, 2, 3) < (1, 2, 5) →  = 3


80
Sorting on Several Keys
• Sorting a deck of cards
• Sort on two keys:
• Suits K1 (most-significant digit, MSD) :
♣<♦<♥<♠
• Face values K2 (least-significant digit, LSD) :
2 < 3 <… < Q < K < A

• Two popular sorting strategies


• MSD first sort
• LSD first sort

81
Sorting on Several Keys
• Two popular sorting strategies
• MSD first sort
• Sort the cards into 4 piles using K1, one for each suit
• Sort each of the 4 piles using K2
• Cascade the sorted 4 piles with the order of (club,
diamond, heart, spade):
• 2♣,…,A♣,2♦…,A♦,2♥……A♥,2♠,….A♠
• LSD first sort
• Sort the cards using K2 into 13 piles
• Then cascade the 13 piles into a big pile with the order
of 2, 3, 4, ..., J, Q, K, A
• Finally, sort the big pile using a stable sorting algorithm
on the suit (K1) into 4 piles, then combine:
• 2♣,…,A♣,2♦…,A♦,2♥……A♥,2♠,….A♠
82
Comparing MSD & LSD
• LSD-first is simpler, as the piles and subpiles
obtained do not have to be sorted independently
(provided the sorting scheme used for sorting on
the keys Ki, 1 ≤ i < r, is stable)
• LSD-first sort is commonly chosen for computer
sorting
• MSD-first sort tends to incur more overhead
because of the need to independently sort multiple
groups:
• First sorting on suit – bin sort into 4 bins
• Then sorting on face – similar to insertion sort for each
bin
83
Radix Sort
• LSD or MSD sorting can be used to sort even when the
records have only one key K.
• Interpret the key as being composed of several subkeys, e.g., if 0
≤ K ≤ 999  K = (K1, K2, K3), 0 ≤ Ki ≤ 9
• The sort on each Ki can be carried out using a bin sort with 10
bins.
• In a Radix Sort, we decompose the sort key into several
subkeys using some radix r
• e.g., 365 is decomposed into 3, 6, and 5 with a radix = 10, so the
number of bins required is 10.
• Radix-sort is a specialization of lexicographic-sort that uses
bucket-sort as the stable sorting algorithm in each
dimension
• Radix-sort runs in time O(d( n + N))
84
Radix Sort Example
• Sorting a sequence of 4-bit integers (binary
decomposition of key)

1001 0010 1001 1001 0001

0010 1110 1101 0001 0010

1101 1001 0001 0010 1001

0001 1101 0010 1101 1101

1110 0001 1110 1110 1110

85
LSD-First Radix Sort Example

Sort Stable sort Stable sort


105 290 105 105
342 540 540 193
555 342 342 290
290 193 555 342
540 105 290 540
193 555 193 555

86
Prog. 7.5 LSD Radix Sort
template <class T>
int RadixSort(T *a, int *link, const int d, const int
r, const int n)
{
// 使用一個 d-digit 、 radix-r 的基數排序法來排序 a[1:n]
// digit(a[i], j, r) 回傳 a[i] 的鍵值在第 j 個基數 r 的
數字
// (從左邊)每一個數字的範圍都是 [0, r) 。
// Sorting within a digit is done using bin sort

int e[r], f[r]; // queue 的頭以及尾巴指標


// creating initial chain of records starting at first
int first = 1;
for(int i=1; i<n; i++) link[i] = i + 1;// link into a
chain
link[n] = 0;
87
for(i = d-1 ; i >=0; i--)
{// 根據數字 i 來排序
fill(f, f+r, 0); // 將容器初始化為空的佇列
for(int current=first;current;current=
link[current])
{// put records into queues/bins
int k = digit(a[current], i , r);
if (f[k]== 0) f[k] = current;
else link[e[k]] = current;
e[k] =current;
}
for (j = 0; !f[j]; j++); // 找出第一個非空的佇列 / 容器
first = f[j];
int last = e[j];
for(int k = j + 1; k < r; k++) // 連接其餘的佇列
{
if(f[k]) { link[last] = f[k]; last = e[k]; }
}
link[last] = 0;
}
return first; 88
Example: Sorting 10 numbers in [0, 999], r=10, d=3

a [1] a [2] a [3] a [4] a [5] a [6] a [7] a [8] a [9] a [10]
179 208 306 93 859 984 55 9 271 33

(a) Initial input

e [0] e [1] e [2] e [3] e [4] e [5] e [6] e [7] e [8] e [9]

33 859

271 93 984 55 306 208 179

f [0] f [1] f [2] f [3] f [4] f [5] f [6] f [7] f [8] f [9]

271 93 33 984 55 306 208 179 859 9


(b) First-pass queues and resulting chain
89
e [0] e [1] e [2] e [3] e [4] e [5] e [6] e [7] e [8] e [9]

208 859 179

306 33 55 271 984 93

f [0] f [1] f [2] f [3] f [4] f [5] f [6] f [7] f [8] f [9]

306 208 9 33 55 859 271 179 984 93

(c) Second-pass queues and resulting chain

90
e [0] e [1] e [2] e [3] e [4] e [5] e [6] e [7] e [8] e [9]

93

55

33 271

9 179 208 306 859 984

f [0] f [1] f [2] f [3] f [4] f [5] f [6] f [7] f [8] f [9]

9 33 55 93 179 208 271 306 859 948


(d) Third-pass queues and resulting chain

91
Outline
• 7.1 Introduction
• 7.2 Insertion Sort
• 7.3 Quick Sort
• 7.4 How fast can we sort?
• 7.5 Merge sort
• 7.6 Heap sort
• 7.7 Radix sort
• 7.8 (List and table sorts)
• 7.9 Summary of internal sorting
92
Comparison of Sort Methods

Method Worst Average


Insertion Sort n2 n2
Heap Sort nlogn nlogn
Merge Sort nlogn nlogn
Quick Sort n2 nlogn

93
Comparison of Sort Methods
n Insert Heap Merge Quick
0 0.000 0.000 0.000 0.000
50 0.004 0.009 0.008 0.006
100 0.011 0.019 0.017 0.013
200 0.033 0.042 0.037 0.029
300 0.067 0.066 0.059 0.045
400 0.117 0.090 0.079 0.061
500 0.179 0.116 0.100 0.079
1000 0.662 0.245 0.213 0.169
2000 2.439 0.519 0.459 0.358
3000 5.390 0.809 0.721 0.560
4000 9.530 1.105 0.972 0.761
5000 15.935 1.410 1.271 0.970

94
Comparison of Sort Methods
5 Insertion Sort

2
Heap Sort
Merge Sort
1

Quick Sort
0
0 500 1000 2000 3000 4000 5000

95
Summary of Sorting Algorithms
Algorithm Time (WC) Time(Ave) Notes
in-place
selection-
sort O(n2) O(n2) slow (good for small
inputs)
in-place
insertion-
sort O(n2) O(n2) slow (good for small
inputs)
in-place, randomized
quick-sort O(n log n)
O(n2) fastest (good for large
expected inputs)
in-place
heap-sort O(n log n) O(n log n) fast (good for large
inputs)
sequential data access
merge-sort O(n log n) O(n log n) fast (good for huge
inputs) 96
Summary
• Every sorting algorithm has its pros and cons
• No one size fit all solution
• C++'s sort methods
• sort()
• Quick Sort that reverts to Heap Sort when the recursion depth
exceeds some threshold and to Insertion Sort when the
segment size becomes small
• stable_sort()
• Merge Sort that reverts to Insertion Sort when the segment
size becomes small
• partial_sort()
• Heap Sort that has ability to stop when only the first k
elements need to be sorted

97

You might also like