Chap9 Slides
Chap9 Slides
A bitonic merging network for n = 16. The input wires are numbered 0,1,
…, n - 1, and the binary representation of these numbers is shown.
Each column of comparators is drawn separately; the entire figure
represents a BM[16] bitonic merging network. The network takes a
bitonic sequence and outputs it in sorted order.
Sorting Networks: Bitonic Sort
How do we sort an unsorted sequence using a bitonic merge?
• The parallel time depends on the split and merge time, and the
quality of the pivot.
• The latter is an issue independent of parallelism, so we focus on the
first aspect, assuming ideal pivot selection.
• The algorithm executes in four steps: (i) determine and broadcast
the pivot; (ii) locally rearrange the array assigned to each process;
(iii) determine the locations in the globally rearranged array that the
local elements will go to; and (iv) perform the global rearrangement.
• The first step takes time Θ(log p), the second, Θ(n/p) , the third,
Θ(log p) , and the fourth, Θ(n/p).
• The overall complexity of splitting an n-element array is Θ(n/p) +
Θ(log p).
Parallelizing Quicksort: Shared Address Space
Formulation
• The process recurses until there are p lists, at which
point, the lists are sorted locally.
• Therefore, the total parallel time is:
• A processor in the low half of the machine sends its list Ui to the
paired processor in the other half. The paired processor sends its
list Li.
• It is easy to see that after this step, all elements less than the
pivot are in the low half of the machine and all elements greater
than the pivot are in the high half.
Parallelizing Quicksort: Message Passing
Formulation
• The above process is recursed until each processor has
its own local list, which is sorted locally.
• The time for a single reorganization is Θ(log p) for
broadcasting the pivot element, Θ(n/p) for splitting the
locally assigned portion of the array, Θ(n/p) for exchange
and local reorganization.
• We note that this time is identical to that of the
corresponding shared address space formulation.
• It is important to remember that the reorganization of
elements is a bandwidth sensitive operation.
Bucket and Sample Sort
• In Bucket sort, the range [a,b] of input numbers is
divided into m equal sized intervals, called buckets.
• Each element is placed in its appropriate bucket.
• If the numbers are uniformly divided in the range, the
buckets can be expected to have roughly identical
number of elements.
• Elements in the buckets are locally sorted.
• The run time of this algorithm is Θ(nlog(n/m)).
Parallel Bucket Sort
• Parallelizing bucket sort is relatively simple. We can
select m = p.
• In this case, each processor has a range of values it is
responsible for.
• Each processor runs through its local list and assigns
each of its elements to the appropriate processor.
• The elements are sent to the destination processors
using a single all-to-all personalized communication.
• Each processor sorts all the elements it receives.
Parallel Bucket and Sample Sort
• The critical aspect of the above algorithm is one of
assigning ranges to processors. This is done by suitable
splitter selection.
• The splitter selection method divides the n elements into
m blocks of size n/m each, and sorts each block by
using quicksort.
• From each sorted block it chooses m – 1 evenly spaced
elements.
• The m(m – 1) elements selected from all the blocks
represent the sample used to determine the buckets.
• This scheme guarantees that the number of elements
ending up in each bucket is less than 2n/m.
Parallel Bucket and Sample Sort