0% found this document useful (0 votes)

96 views91 pages

Introduction To Parallel Algorithms and Parallel Program Design

The document discusses key concepts in designing parallel algorithms and parallel program structure, including: 1) Partitioning tasks and data to expose parallelism while balancing workload and minimizing redundant work and communication. 2) Coordinating communication between tasks to allow computation to proceed while considering factors like locality, concurrency, and overlap with computation. 3) Agglomerating tasks to reduce communication costs while maintaining sufficient parallelism and scalability. 4) Mapping tasks to resources considering aspects like load balancing, data and task locality, and dynamic scheduling.

Uploaded by

Ganesh Gajengi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views91 pages

Introduction To Parallel Algorithms and Parallel Program Design

Uploaded by

Ganesh Gajengi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

You are on page 1/ 91

Introduction to Parallel Algorithms

and Parallel Program Design

Methodological Design
• Partition
– Task/data
decomposition
• Communication
– Task execution
coordination
• Agglomeration
– Evaluation of the
structure
I. Foster, “Designing and Building
• Mapping Parallel Programs,” Addison-Wesley,
– Resource assignment 1995. Book is online, see webpage.

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 2

Partitioning
• Partitioning stage is intended to expose
opportunities for parallel execution
• Focus on defining large number of
small task to yield a fine-grained
decomposition of the problem
• A good partition divides into small pieces
both the computational tasks associated with a problem and
the data on which the tasks operates
• Domain decomposition focuses on computation data
• Functional decomposition focuses on computation tasks
• Mixing domain/functional decomposition is possible

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 3

Domain and Functional Decomposition
• Domain decomposition of 2D / 3D grid

• Functional decomposition of a climate model

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 4

Partitioning Checklist
• Does your partition define at least an order of magnitude
more tasks than there are processors in your target
computer? If not, may loose design flexibility.
• Does your partition avoid redundant computation and
storage requirements? If not, may not be scalable.
• Are tasks of comparable size? If not, it may be hard to
allocate each processor equal amounts of work.
• Does the number of tasks scale with problem size? If not
may not be able to solve larger problems with more
processors
• Have you identified several alternative partitions?

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 5

Communication (Interaction)
• Tasks generated by a partition must
interact to allow the computation
to proceed
– Information flow: data and control
• Types of communication
– Local vs. Global: locality of communication
– Structured vs. Unstructured: communication patterns
– Static vs. Dynamic: determined by runtime conditions
– Synchronous vs. Asynchronous: coordination degree
• Granularity and frequency of communication
– Size of data exchange
• Think of communication as interaction and control
– Applicable to both shared and distributed memory parallelism

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 6

Types of Communication
• Point-to-point
• Group-based
• Hierachical
• Collective

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 7

Communication Design Checklist
• Is the distribution of communications equal?
– Unbalanced communication may limit scalability
• What is the communication locality?
– Wider communication locales are more expensive
• What is the degree of communication concurrency?
– Communication operations may be parallelized
• Is computation associated with different tasks able to
proceed concurrently? Can communication be overlapped
with computation?
– Try to reorder computation and communication to expose
opportunities for parallelism

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 8

Agglomeration
• Move from parallel abstractions
to real implementation
• Revisit partitioning and communication
– View to efficient algorithm execution
• Is it useful to agglomerate?
– What happens when tasks are combined?
• Is it useful to replicate data and/or computation?
• Changes important algorithm and performance ratios
– Surface-to-volume: reduction in communication at the expense of
decreasing parallelism
– Communication/computation: which cost dominates
• Replication may allow reduction in communication
• Maintain flexibility to allow overlap

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 9

Types of Agglomeration
• Element to column

• Element to block
– Better surface to volume

• Task merging

• Task reduction
– Reduces communication

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 10

Agglomeration Design Checklist
• Has increased locality reduced communication
costs?
• Is replicated computation worth it?
• Does data replication compromise scalability?
• Is the computation still balanced?
• Is scalability in problem size still possible?
• Is there still sufficient concurrency?
• Is there room for more agglomeration?
• Fine-grained vs. coarse-grained?
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 11
Mapping
• Specify where each task is to execute
– Less of a concern on shared-memory
systems
• Attempt to minimize execution time
– Place concurrent tasks on different
processors to enhance physical concurrency
– Place communicating tasks on same processor, or on processors close
to each other, to increase locality
– Strategies can conflict!
• Mapping problem is NP-complete
– Use problem classifications and heuristics
• Static and dynamic load balancing

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 12

Mapping Algorithms
• Load balancing (partitioning) algorithms
• Data-based algorithms
– Think of computational load with respect to amount of data
being operated on
– Assign data (i.e., work) in some known manner to balance
– Take into account data interactions
• Task-based (task scheduling) algorithms
– Used when functional decomposition yields many tasks with
weak locality requirements
– Use task assignment to keep processors busy computing
– Consider centralized and decentralize schemes

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 13

Mapping Design Checklist
• Is static mapping too restrictive and non-responsive?
• Is dynamic mapping too costly in overhead?
• Does centralized scheduling lead to bottlenecks?
• Do dynamic load-balancing schemes require too
much coordination to re-balance the load?
• What is the tradeoff of dynamic scheduling
complexity versus performance improvement?
• Are there enough tasks to achieve high levels of
concurrency? If not, processors may idle.

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 14

Types of Parallel Programs
• Flavors of parallelism
– Data parallelism
• all processors do same thing on different data
– Task parallelism
• processors are assigned tasks that do different things
• Parallel execution models
– Data parallel
– Pipelining (Producer-Consumer)
– Task graph
– Work pool
– Master-Worker

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 15

Data Parallel
• Data is decomposed (mapped) onto processors
• Processors performance similar (identical) tasks on data
• Tasks are applied concurrently
• Load balance is obtained through data partitioning
– Equal amounts of work assigned
• Certainly may have interactions between processors
• Data parallelism scalability
– Degree of parallelism tends to increase with problem size
– Makes data parallel algorithms more efficient
• Single Program Multiple Data (SPMD)
– Convenient way to implement data parallel computation
– More associated with distributed memory parallel execution

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 16

Matrix - Vector Multiplication
• Axb=y
• Allocate
j
tasks to rows of A
y[i] = ∑A[i,j]*b[j]

• Dependencies?
• Speedup?
• Computing each
element of y can be
done independently
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 17
Matrix-Vector Multiplication (Limited Tasks)

• Suppose we only have 4 tasks

• Dependencies?
• Speedup?

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 18

Matrix Multiplication
A B C

• AxB=C x =
• A[i,:] • B[:,j] = C[i,j]

• Row partitioning
– N tasks

• Block partitioning
– N*N/B tasks

• Shading shows data

sharing in B matrix

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 19

Granularity of Task and Data Decompositions

• Granularity can be with respect to tasks and data

• Task granularity
– Equivalent to choosing the number of tasks
– Fine-grained decomposition results in large # tasks
– Large-grained decomposition has smaller # tasks
– Translates to data granularity after # tasks chosen
• consider matrix multiplication
• Data granularity
– Think of in terms of amount of data needed in operation
– Relative to data as a whole
– Decomposition decisions based on input, output, input-output, or
intermediate data

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 20

Mesh Allocation to Processors
• Mesh model of Lake Superior
• How to assign mesh elements
to processors

• Distribute onto 8 processors

randomly graph partitioning
for minimum
edge cut

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 21

Pipeline Model
• Stream of data operated on by succession of tasks
Task 1 Task 2 Task 3 Task 4
– Tasks are assigned to processors
data P1 P2 P3 P4
• Consider N data units input
Task 1 Task 2 Task 3 Task 4

 Sequential

 Parallel (each task assigned to a processor)

4 data units 8 data units

4-way parallel, but

for longer time

4-way
Monday,parallel
April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 22
Pipeline Performance
• N data and T tasks
• Each task takes unit time t
• Sequential time = N*T*t
• Parallel pipeline time = start + finish + (N-2T)/T * t
= O(N/T) (for N>>T)
• Try to find a lot of data to pipeline
• Try to divide computation in a lot of pipeline tasks
– More tasks to do (longer pipelines)
– Shorter tasks to do
• Pipeline computation is a special form of
producer-consumer parallelism
– Producer tasks output data input by consumer tasks
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 23
Tasks Graphs
• Computations in any parallel algorithms can be viewed as a task
dependency graph
• Task dependency graphs can be non-trivial
– Pipeline Task 1 Task 2 Task 3 Task 4

– Arbitrary (represents the algorithm dependencies)

Numbers are
time taken to
perform task

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 24

Task Graph Performance
• Determined by the critical path (span)
– Sequence of dependent tasks that takes the longest time

Min time = 27 Min time = 34

– Critical path length bounds parallel execution time

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 25

Task Assignment (Mapping) to Processors

• Given a set of tasks and number of processors

• How to assign tasks to processors?
• Should take dependencies into account
• Task mapping will determine execution time

Total time = ? Total time = ?

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 26
Task Graphs in Action
• Uintah task graph scheduler
– C-SAFE: Center for Simulation of
Accidental Fires and Explosions,
University of Utah
– Large granularity tasks Task graph for
PDE solver
• PLASMA
– DAG-based parallel
linear algebra
DAG of QR for a
– DAGuE: A generic 4 × 4 tiles matrix on a
distributed DAG engine 2 × 2 grid of
processors.
for HPC

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 27

Bag o’ Tasks Model and Worker Pool
• Set of tasks to be performed
• How do we schedule them?
Processors
– Find independent tasks
– Assign tasks to available processors
• Bag o’ Tasks approach
– Tasks are stored in
Bag o‘
a bag waiting to run tasks
independent tasks
ready to run …
– If all dependencies
are satisified, it is
moved to a ready to run queue
– Scheduler assigns a task to a free processor
• Dynamic approach that is effective for load balancing
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 28
Master-Worker Parallelism
• One or more master processes generate work
• Masters allocate work to worker processes
• Workers idle if have nothing to do
• Workers are mostly stupid and must be told what to do
– Execute independently
– May need to synchronize, but most be told to do so
• Master may become the bottleneck if not careful
• What are the performance factors and expected performance
behavior
– Consider task granularity and asynchrony
– How do they interact?

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 29

Master-Worker Execution Model (Li Li)

Li Li, “Model-based Automatics Performance Diagnosis of Parallel Computations,” Ph.D. thesis, 2007.

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 30

M-W Execution Trace (Li Li)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 31

Search-Based (Exploratory) Decomposition
• 15-puzzle problem
• 15 tiles numbered 1 through 15 placed in 4x4 grid
– Blank tile located somewhere in grid
– Initial configuration is out of order
– Find shortest sequence of moves to put in order

• Sequential search across space of solutions

– May involve some heuristics
Monday, April 17, 2023
BE_Comp_HPC_Algorithm Design... KNH 32
Parallelizing the 15-Puzzle Problem
• Enumerate move choices at each stage
• Assign to processors
• May do pruning
• Wasted work

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 33

Divide-and-Conquer Parallelism
• Break problem up in orderly manner into
smaller, more manageable chunks and solve
• Quicksort example

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 34

Dense Matrix Algorithms
• Great deal of activity in algorithms and software for
solving linear algebra problems
– Solution of linear systems ( Ax = b )
– Least-squares solution of over- or under-determined systems (
min ||Ax-b|| )
– Computation of eigenvalues and eigenvectors ( Ax=x )
– Driven by numerical problem solving in scientific computation
• Solutions involves various forms of matrix computations
• Focus on high-performance matrix algorithms
– Key insight is to maximize computation to communication

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 35

Solving a System of Linear Equations
• Ax=b
a0,0x0 + a0,1x1 + … + a0,n-1xn-1 = b0
a1,0x0 + a1,1x1 + … + a1,n-1xn-1 = b1
…
An-1,0x0 + an-1,1x1 + … + an-1,n-1xn-1 = bn-1

• Gaussian elimination (classic algorithm)

– Forward elimination to Ux=y (U is upper triangular)
• without or with partial pivoting
– Back substitution to solve for x
– Parallel algorithms based on partitioning of A

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 36

Sequential Gaussian Elimination
1. procedure GAUSSIAN ELIMINATION (A, b, y)
2. Begin
3. for k := 0 to n - 1 do /* Outer loop */
4. begin
5. for j := k + 1 to n - 1 do
6. A[k, j] := A[k, j ]/A[k, k]; /* Division step */
7. y[k] := b[k]/A[k, k];
8. A[k, k] := 1;
9. for i := k + 1 to n - 1 do
10. begin
11. for j := k + 1 to n - 1 do
12. A[i, j] := A[i, j] - A[i, k] x A[k, j ]; /* Elimination step */
13. b[i] := b[i] - A[i, k] x y[k];
14. A[i, k] := 0;
15. endfor; /*Line9*/
16. endfor; /*Line3*/
17. end GAUSSIAN ELIMINATION
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 37
Computation Step in Gaussian Elimination

5x + 3y = 22 x = (22 - 3y) / 5 x = (22 - 3y) / 5

8x + 2y = 13 8(22 - 3y)/5 + 2y = 13 y = (13 - 176/5) / (24/5 + 2)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 38

Rowwise Partitioning on Eight Processes

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 39

Rowwise Partitioning on Eight Processes

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 40

2D Mesh Partitioning on 64 Processes

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 41

Back Substitution to Find Solution
1. procedure BACK SUBSTITUTION (U, x, y)
2. begin
3. for k := n - 1 downto 0 do /* Main loop */
4. begin
5. x[k] := y[k];
6. for i := k - 1 downto 0 do
7. y[i] := y[i] - x[k] xU[i, k];
8. endfor;
9. end BACK SUBSTITUTION

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 42

Dense Linear Algebra (www.netlib.gov)
• Basic Linear Algebra Subroutines (BLAS)
– Level 1 (vector-vector): vectorization
– Level 2 (matrix-vector): vectorization, parallelization
– Level 3 (matrix-matrix): parallelization
• LINPACK (Fortran)
– Linear equations and linear least-squares
• EISPACK (Fortran)
– Eigenvalues and eigenvectors for matrix classes
• LAPACK (Fortran, C) (LINPACK + EISPACK)
– Use BLAS internally
• ScaLAPACK (Fortran, C, MPI) (scalable LAPACK)
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 43
Numerical Libraries
• PETSc
– Data structures / routines for partial differential equations
– MPI based
• SuperLU
– Large sparse nonsymmetric linear systems
• Hypre
– Large sparse linear systems
• TAO
– Toolkit for Advanced Optimization
• DOE ACTS
– Advanced CompuTational Software
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 44
Sorting Algorithms
• Task of arranging unordered collection into order
• Permutation of a sequence of elements
• Internal versus external sorting
– External sorting uses auxiliary storage
• Comparison-based
– Compare pairs of elements and exchange
– O(n log n)
• Noncomparison-based
– Use known properties of elements
– O(n)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 45

Sorting on Parallel Computers
• Where are the elements stored?
– Need to be distributed across processes
– Sorted order will be with respect to process order
• How are comparisons performed?
– One element per process
• compare-exchange
• interprocess communication will dominate execution time
– More than one element per process
• compare-split
• Sorting networks
– Based on comparison network model
• Contrast with shared memory sorting algorithms
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 46
Single vs. Multi Element Comparision
• One element per processor

• Multiple elements per processor

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 47

Sorting Networks
• Networks to sort n elements in less than O(n log n)
• Key component in network is a comparator
– Increasing or decreasing comparator

• Comparators connected in parallel and permute elements

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 48

Sorting Network Design
• Multiple comparator stages (# stages, # comparators)
• Connected together by interconnection network
• Output of last stage is the sorted list
• O(log2n) sorting time
• Convert any
sorting network
to sequential
algorithm

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 49

Bitonic Sort
• Create a bitonic sequence then sort the sequence
• Bitonic sequence
– sequence of elements <a0, a1, …, an-1>
– <a0, a1, …, ai> is monotonically increasing
– <ai, ai+1, …, an-1> is monotonically decreasing
• Sorting using bitonic splits is called bitonic merge
• Bitonic merge network is a network of comparators
– Implement bitonic merge
• Bitonic sequence is formed from unordered sequence
– Bitonic sort creates a bitonic sequence
– Start with sequence of size two (default bitonic)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 50

Bitonic Sort Network
Unordered sequence Bitonic sequence

decrease

increase

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 51

Bitonic Merge Network
Bitonic sequence Sorted sequence

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 52

Parallel Bitonic Sort on a Hypercube
1. procedure BITONIC SORT(label, d)
2. begin
3. for i := 0 to d - 1 do
4. for j := i downto 0 do
5. if (i + 1)st bit of label = j th bit of label then
6. comp exchange max(j);
7. else
8. comp exchange min(j);
9. end BITONIC SORT
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 53
Parallel Bitonic Sort on a Hypercube

Last stage

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 54

Bubble Sort and Variants
• Can easily parallelize sorting algorithms of O(n2)
• Bubble sort compares and exchanges adjacent
elements
– O(n) each pass
– O(n) passes
– Available parallelism?
• Odd-even transposition sort
– Compares and exchanges odd and even pairs
– After n phases, elements are sorted
– Available parallelism?
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 55
Odd-Even Transposition Sort

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 56

Parallel Odd-Even Transposition Sort
1. procedure ODD-EVEN PAR(n)
2. begin
3. id := process’s label
4. for i := 1 to n do
5. begin
6. if i is odd then
7. if id is odd then
8. compare-exchange min(id + 1);
9. else
10. compare-exchange max(id - 1);
11. if i is even then
12. if id is even then
13. compare-exchange min(id + 1);
14. else
15. compare-exchange max(id - 1);
16. end for
17. end ODD-EVEN PAR
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 57
Quicksort
• Quicksort has average complexity of O(n log n)
• Divide-and-conquer algorithm
– Divide into subsequences where every element in
first is less than or equal to every element in the
second
– Pivot is used to split the sequence
– Conquer step recursively applies quicksort
algorithm
• Available parallelism?
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 58
Sequential Quicksort
1. procedure QUICKSORT (A, q, r )
2. begin
3. if q < r then
4. begin
5. x := A[q];
6. s := q;
7. for i := q + 1 to r do
8. if A[i] ≤ x then
9. begin
10. s := s + 1;
11. swap(A[s], A[i ]);
12. end if
13. swap(A[q], A[s]);
14. QUICKSORT (A, q, s);
15. QUICKSORT (A, s + 1, r );
16. end if
17. end QUICKSORT
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 59
Parallel Shared Address Space Quicksort

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 60

Parallel Shared Address Space Quicksort

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 61

Bucket Sort and Sample Sort
• Bucket sort is popular when elements (values) are
uniformly distributed over an interval
– Create m buckets and place elements in appropriate bucket
– O(n log(n/m))
– If m=n, can use value as index to achieve O(n) time
• Sample sort is used when uniformly distributed
assumption is not true
– Distributed to m buckets and sort each with quicksort
– Draw sample of size s
– Sort samples and choose m-1 elements to be splitters
– Split into m buckets and proceed with bucket sort

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 62

Parallel Sample Sort

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 63

Graph Algorithms
• Graph theory important in computer science
• Many complex problems are graph problems

• G = (V, E)
– V finite set of points called vertices
– E finite set of edges
– e  E is an pair (u,v), where u,v  V
– Unordered and ordered graphs

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 64

Graph Terminology
• Vertex adjacency if (u,v) is an edge
• Path from u to v if there is an edge sequence starting at u and
ending at v
• If there exists a path, v is reachable from u
• A graph is connected if all pairs of vertices are connected by a
path
• A weighted graph associates weights with each edge
• Adjacency matrix is an n x n array A such that
– Ai,j = 1 if (vi,vj)  E; 0 otherwise
– Can be modified for weighted graphs (∞ is no edge)
– Can represent as adjacency lists

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 65

Graph Representations
• Adjacency matrix

• Adjacency list

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 66

Minimum Spanning Tree
• A spanning tree of an undirected graph G is a subgraph of
G that is a tree containing all the vertices of G
• The minimum spanning tree (MST) for a weighted
undirected graph is a spanning tree with minimum weight
• Prim’s algorithm can be used
– Greedy algorithm
– Selects an arbitrary starting vertex
– Chooses new vertex guaranteed to be in MST
– O(n2)
– Prim’s algorithm is iterative

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 67

Prim’s Minimum Spanning Tree Algorithm
1. procedure PRIM MST(V, E,w, r )
2. begin
3. VT := {r };
4. d[r] := 0;
5. for all v  (V - VT ) do
6. if edge (r, v) exists set d[v] := w(r, v);
7. else set d[v] :=∞;
8. while VT  V do
9. begin
10. find a vertex u such that d[u] := min{d[v]|v  (V - VT )};
11. VT := VT  {u};
12. for all v  (V - VT ) do
13. d[v] := min{d[v],w(u, v)}; *
14. endwhile
15. end PRIM MST

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 68

Example: Prim’s MST Algorithm

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 69

Example: Prim’s MST Algorithm

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 70

Parallel Formulation of Prim’s Algorithm
• Difficult to perform different iterations of the while loop in
parallel because d[v] may change each time
• Can parallelize each iteration though
• Partition vertices into p subsets Vi, i=0,…,p-1
• Each process Pi computes
di[u]=min{di[v] | v  (V-VT)  Vi}
• Global minimum is obtained using all-to-one reduction
• New vertex is added to VT and broadcast to all processes
• New values of d[v] are computed for local vertex
• O(n2/p) + O(n log p) (computation + communication)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 71

Partitioning in Prim’s Algorithm

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 72

Single-Source Shortest Paths
• Find shortest path from a vertex v to all other vertices
• The shortest path in a weighted graph is the edge with the
minimum weight
• Weights may represent time, cost, loss, or any other
quantity that accumulates additively along a path
• Dijkstra’s algorithm finds shortest paths from vertex s
– Similar to Prim’s MST algorithm
• MST with vertex v as starting vertex
– Incrementally finds shortest paths in greedy manner
– Keep track of minimum cost to reach a vertex from s
– O(n2)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 73

Dijkstra’s Single-Source Shortest Path

1. procedure DIJKSTRA SINGLE SOURCE SP(V, E,w, s)

2. begin
3. VT := {s};
4. for all v  (V - VT ) do
5. if (s, v) exists set l[v] := w(s, v);
6. else set l[v] :=∞;
7. while VT  V do
8. begin
9. find a vertex u such that l[u] := min{l[v]|v  (V - VT )};
10. VT := VT  {u};
11. for all v  (V - VT ) do
12. l[v] := min{l[v], l[u] + w(u, v)};
13. endwhile
14. end DIJKSTRA SINGLE SOURCE SP

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 74

Parallel Formulation of Dijkstra’s Algorithm

• Very similar to Prim’s MST parallel formulation

• Use 1D block mapping as before
• All processes perform computation and
communication similar to that performed in Prim’s
algorithm
• Parallel performance is the same
– O(n2/p) + O(n log p)
– Scalability
• O(n2) is the sequential time
• O(n2) / [O(n2/p) + O(n log p)]

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 75

All Pairs Shortest Path
• Find the shortest path between all pairs of vertices
• Outcome is a n x n matrix D={di,j} such that di,j is the cost
of the shortest path from vertex vi to vertex vj
• Dijsktra’s algorithm
– Execute single-source algorithm on each process
– O(n3)
– Source-partitioned formulation (use sequential algorithm)
– Source-parallel formulation (use parallel algorithm)
• Floyd’s algorithm
– Builds up distance matrix from the bottom up

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 76

Floyd’s All-Pairs Shortest Paths Algorithm

1. procedure FLOYD ALL PAIRS SP(A)

2. begin
3. D(0) = A;
4. for k := 1 to n do
5. for i := 1 to n do
6. for j := 1 to n do
7. d(k)i, j := min d(k-1)i, j , d(k-1)i,k + d(k-1)k, j ;
8. end FLOYD ALL PAIRS SP
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 77
Parallel Floyd’s Algorithm
1. procedure FLOYD ALL PAIRS PARALLEL
(A)
2. begin
3. D(0) = A;
4. for k := 1 to n do
5. forall Pi,j, where i, j ≤ n, do in parallel
6. d(k)i, j := min d(k-1)i, j , d(k-1)i,k + d(k-1)k, j ;
7. end FLOYD ALL PAIRS PARALLEL
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 78
Parallel Graph Algorithm Library – Boost

• Parallel Boost Graph Library

– Andrew Lumsdaine, Indiana University
– Generic C++ library for high-performance parallel
and distributed graph computation
– Builds on the Boost Graph Library (BGL)
• offers similar data structures, algorithms, and syntax
– Targets very large graphs (millions of nodes)
– Distributed-memory parallel processing on
clusters

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 79

Original BGL: Algorithms
• Searches (breadth-first, • Max-flow (Edmonds-Karp, push-
depth-first, A*) relabel)
• Single-source shortest paths • Sparse matrix ordering (Cuthill-
(Dijkstra, Bellman-Ford, DAG) McKee, King, Sloan, minimum
degree)
• All-pairs shortest paths
• Layout (Kamada-Kawai,
(Johnson, Floyd-Warshall) Fruchterman-Reingold, Gursoy-
• Minimum spanning tree Atun)
(Kruskal, Prim) • Betweenness centrality
• Components (connected, • PageRank
strongly connected, • Isomorphism
biconnected) • Vertex coloring
• Maximum cardinality • Transitive closure
matching • Dominator tree

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 80

Original BGL Summary
• Original BGL is large, stable, efficient
– Lots of algorithms, graph types
– Peer-reviewed code with many users, nightly
regression testing, and so on
– Performance comparable to FORTRAN.
• Who should use the BGL?
– Programmers comfortable with C++
– Users with graph sizes from tens of vertices to
millions of vertices
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 81
Parallel BGL
• A version of C++ BGL for computational clusters
– Distributed memory for huge graphs
– Parallel processing for improved performance
• An active research project
• Closely related to the original BGL
– Parallelizing BGL programs should be “easy”
A simple, directed graph… distributed across 3 processors

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 82

Parallel Graph Algorithms
• Connected components
Breadth-first search
•• Strongly connected
Eager Dijkstra’s components
single-source shortest paths
•• Biconnected
Crauser et al.components
single-source shortest paths
•• PageRank
Depth-first search
• Graph coloring
• Minimum spanning tree (Boruvka, Dehne &
• Fruchterman-Reingold layout
Götz)
• Max-flow (Dinic’s)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 83

Big-Data and Map-Reduce
• Big-data deals with processing large data sets
• Nature of data processing problem makes it amenable
to parallelism
– Looking for features in the data
– Extracting certain characteristics
– Analyzing properties with complex data mining algorithms
• Data size makes it opportunistic for partitioning into
large # of sub-sets and processing these in parallel
• We need new algorithms, data structures, and
programming models to deal with problems
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 84
A Simple Big-Data Problem
• Consider a large data collection of text documents
• Suppose we want to find how often a particular
word occurs and determine a probability
distribution for all word occurrences
Sequential algorithm web 2
Count words and weed 1
update statistics
Get next green 2
document
Data Find and
sun 1

collection count words moon 1

land 1

Generate part 1
probability
distributions
Check if more
documents
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 85
Parallelization Approach
• Map: partition the data collection into subsets of
documents and process each subset in parallel
• Reduce: assemble the partial frequency tables to
derive final probability distribution web 2
Parallel algorithm web
webweed2
2
1
weed 1
weedgreen1 2
Get next Count words and green 2
sun 2
green 1
document update statistics sun 1
Data Find and
sun moon1
moon 1
1
land 1 1
collection count words moon
land 1
landpart 1 1
part 1
part 1

Check if more Generate

documents probability
distributions

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 86

Parallelization Approach
• Map: partition the data collection into subsets of
documents and process each subset in parallel
• Reduce: assemble the partial frequency tables to
derive final probability distribution
Parallel algorithm web 2
weed 1
Get next Count words and green 2
document update statistics
Data Find and
sun 1
moon 1
collection count words land 1
part 1

Check if more Generate

documents probability
distributions

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 87

Actually, it is not easy to parallel….
Different programming models
Fundamental issues Message Passing Shared Memory
Scheduling, data distribution, synchronization, inter-
process communication, robustness, fault tolerance,
…

Architectural issues
Flynn’s taxonomy (SIMD, MIMD, etc.), network
topology, bisection bandwidth, cache coherence, …
Different programming constructs
Mutexes, conditional variables, barriers, …
masters/slaves, producers/consumers, work queues,. …

Common problems
Livelock, deadlock, data starvation, priority inversion,
…dining philosophers, sleeping barbers, cigarette
smokers, …

Actually, Programmer’s Nightmare….

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 88

Map-Reduce Parallel Programming
• Become an important distributed parallel
programming paradigm for large-scale applications
– Also applies to shared-memory parallelism
– Becomes one of the core technologies powering big IT
companies, like Google, IBM, Yahoo and Facebook.
• Framework runs on a cluster of machines and
automatically partitions jobs into number of small
tasks and processes them in parallel
• Can capture in combining Map and Reduce parallel
patterns
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 89
Map-Reduce Example
web 1
MAP: Input data  <key, value> pair weed 1

green 1 web 1
sun 1 weed 1
moon 1 green 1
land 1 sun1 1
web
part 1 moon 1

Data Map web

weed
1
1
land
green 1web 1 1
green 1
Collection: Split the data to web … 1
sun
1
1weed
part 1 1

moon web1green 1 1
split1 Supply multiple weedKEY 1 VALUE
land 1sun
green 1 1
green 1 … 1moon 1 1
processors sun 1
part

web KEY1land 1
VALUE
moon 1
Data green 1part 1

Map land 1
… 1web 1
Collection: part 1
KEY green
VALUE 1

split 2 web 1
……

…
… 1
green 1 KEY VALUE
… 1

KEY VALUE
Data
Collection:
split n
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 90
MapReduce
MAP: Input data  <key, value> pair
REDUCE: <key, value> pair  <result>

Reduce
Data Map
Collection:
Split the data to
split1
Supply multiple
processors
Data Reduce
Collection: Map
split 2
……

…
Data
Collection: Reduce
Map
split n

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 91

PDC UNIT-2
No ratings yet
PDC UNIT-2
48 pages
PDA_4
No ratings yet
PDA_4
82 pages
Corporate Office: Aakash Tower, 8, Pusa Road, New Delhi-110005
No ratings yet
Corporate Office: Aakash Tower, 8, Pusa Road, New Delhi-110005
22 pages
What is HPC Question Bank
No ratings yet
What is HPC Question Bank
22 pages
Theory and Practice of Second Language Acquisition - Danuta Gabryś-Barker (Gonca) - A Compilation of Articles
No ratings yet
Theory and Practice of Second Language Acquisition - Danuta Gabryś-Barker (Gonca) - A Compilation of Articles
194 pages
5 - Designing Parallel Programs
No ratings yet
5 - Designing Parallel Programs
52 pages
Parallel Algorithms Presentation (1)
No ratings yet
Parallel Algorithms Presentation (1)
32 pages
FoP HPC Unit II
No ratings yet
FoP HPC Unit II
107 pages
AA-Part1 (1)
No ratings yet
AA-Part1 (1)
43 pages
3.1.3 Processes and Mapping (1/5)
No ratings yet
3.1.3 Processes and Mapping (1/5)
74 pages
Module 1-ICT Concepts
No ratings yet
Module 1-ICT Concepts
48 pages
High Performance Computing (HPC) - Lec2
No ratings yet
High Performance Computing (HPC) - Lec2
53 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
Lecture 6 Principles of Parallel Algorithm Design
No ratings yet
Lecture 6 Principles of Parallel Algorithm Design
35 pages
Embedded-System-Partitioning-Cosynthesis
No ratings yet
Embedded-System-Partitioning-Cosynthesis
43 pages
Lecture 3 and 4HPC
No ratings yet
Lecture 3 and 4HPC
24 pages
WINSEM2022_23_CSE4001_ETH_VL2022230503182_Reference_Material_I_02
No ratings yet
WINSEM2022_23_CSE4001_ETH_VL2022230503182_Reference_Material_I_02
28 pages
ACA 2024W 02 Program design
No ratings yet
ACA 2024W 02 Program design
27 pages
CS416 - Parallel and Distributed Computing: Lecture # 6 (19-03-2021) Spring 2021 FAST - NUCES, Faisalabad Campus
No ratings yet
CS416 - Parallel and Distributed Computing: Lecture # 6 (19-03-2021) Spring 2021 FAST - NUCES, Faisalabad Campus
31 pages
PDC Summers Finals Revision Notes
No ratings yet
PDC Summers Finals Revision Notes
50 pages
unit1 2 and 3
No ratings yet
unit1 2 and 3
76 pages
in3200-chap05
No ratings yet
in3200-chap05
34 pages
Partitioning
No ratings yet
Partitioning
37 pages
Hpc Notes Unit 3
No ratings yet
Hpc Notes Unit 3
7 pages
Processes and Mapping, Decomposition Techniques
No ratings yet
Processes and Mapping, Decomposition Techniques
28 pages
Parallel Alg Design
No ratings yet
Parallel Alg Design
68 pages
Parallel Computing
No ratings yet
Parallel Computing
91 pages
OS_CO3,4,
No ratings yet
OS_CO3,4,
10 pages
Meril Sunscreen
No ratings yet
Meril Sunscreen
26 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
65 pages
Masterclass Booklet: Minor Scale Soloing Masterclass Beginner
100% (1)
Masterclass Booklet: Minor Scale Soloing Masterclass Beginner
45 pages
Lecture 6
No ratings yet
Lecture 6
37 pages
Introduction To Parallel Computing Design and Anal
No ratings yet
Introduction To Parallel Computing Design and Anal
53 pages
Simulating Ocean Currents
No ratings yet
Simulating Ocean Currents
35 pages
Common PDC Module3
No ratings yet
Common PDC Module3
43 pages
Types of Government Accounts Cash Book
No ratings yet
Types of Government Accounts Cash Book
29 pages
MODULE 10 Ô luyện tiếng Anh
No ratings yet
MODULE 10 Ô luyện tiếng Anh
16 pages
HPC Lectures 1 5
No ratings yet
HPC Lectures 1 5
18 pages
Chapter 3 - Principles of Parallel Algorithm Design
No ratings yet
Chapter 3 - Principles of Parallel Algorithm Design
52 pages
Parallel Programming: Lecture #9
No ratings yet
Parallel Programming: Lecture #9
24 pages
Group3_Parallel_Computing_Techniques_presentation power point 2025 (2)
No ratings yet
Group3_Parallel_Computing_Techniques_presentation power point 2025 (2)
27 pages
Module 3
No ratings yet
Module 3
12 pages
Principles of Parallel Algorithm Design
No ratings yet
Principles of Parallel Algorithm Design
63 pages
Parallel Programming: Sathish S. Vadhiyar Course Web Page
No ratings yet
Parallel Programming: Sathish S. Vadhiyar Course Web Page
36 pages
Unit 2_Part_1
No ratings yet
Unit 2_Part_1
32 pages
HPC_Detailed_Notes
No ratings yet
HPC_Detailed_Notes
5 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
Class11_Accounts_Q5_to_Q10
No ratings yet
Class11_Accounts_Q5_to_Q10
3 pages
Programming For Performance
No ratings yet
Programming For Performance
79 pages
03-Task Decomposition and Mapping
No ratings yet
03-Task Decomposition and Mapping
62 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
Brief Overview of Steel Authority of India Ltd. (SAIL)
No ratings yet
Brief Overview of Steel Authority of India Ltd. (SAIL)
8 pages
Teachingplan 0 S 8 Thsem
No ratings yet
Teachingplan 0 S 8 Thsem
11 pages
Change Management and Control Policy V3.0
No ratings yet
Change Management and Control Policy V3.0
8 pages
HPC Lecture 2 Points
No ratings yet
HPC Lecture 2 Points
7 pages
Estimate of Record
No ratings yet
Estimate of Record
9 pages
Parallel and Distributed Algorithms-IMPORTANT QUESTION
100% (1)
Parallel and Distributed Algorithms-IMPORTANT QUESTION
15 pages
Rules and Principles of Airfare Calculation
No ratings yet
Rules and Principles of Airfare Calculation
12 pages
Project Proposal
No ratings yet
Project Proposal
7 pages
Parallel Algorithms: Peter Harrison and William Knottenbelt
No ratings yet
Parallel Algorithms: Peter Harrison and William Knottenbelt
65 pages
Lecture4 PDF
No ratings yet
Lecture4 PDF
23 pages
Agatha Christie's World
No ratings yet
Agatha Christie's World
14 pages
Literature Review About Customer Service
100% (3)
Literature Review About Customer Service
8 pages
TJ v Series Catalog Final V3
No ratings yet
TJ v Series Catalog Final V3
13 pages
Parallel Programming
No ratings yet
Parallel Programming
18 pages
ConcurrencyDecomposition Parallel Algorithm
No ratings yet
ConcurrencyDecomposition Parallel Algorithm
40 pages
Operating Systems Glossary
No ratings yet
Operating Systems Glossary
4 pages
Lecture 5 Principles of Parallel Algorithm Design
No ratings yet
Lecture 5 Principles of Parallel Algorithm Design
30 pages
HPC Ut 2
No ratings yet
HPC Ut 2
4 pages
HPC Unit 2
No ratings yet
HPC Unit 2
2 pages
ME-CSE-Teaching&Learning
No ratings yet
ME-CSE-Teaching&Learning
9 pages
List of Health Facilities in KANO STATE
No ratings yet
List of Health Facilities in KANO STATE
32 pages
CS621-CHEATSHEET.docx
No ratings yet
CS621-CHEATSHEET.docx
11 pages
X. Mapping Techniques: 27 April, 2009
No ratings yet
X. Mapping Techniques: 27 April, 2009
27 pages
Con Currency Mapping
No ratings yet
Con Currency Mapping
40 pages
College of Color
No ratings yet
College of Color
1 page
18sound Kit15
No ratings yet
18sound Kit15
17 pages
Tenneco Sanmina ITW Global Automotive Caparo Timken Madras Engineering Industries
No ratings yet
Tenneco Sanmina ITW Global Automotive Caparo Timken Madras Engineering Industries
4 pages
Load Balancing in Parallel Computing
No ratings yet
Load Balancing in Parallel Computing
5 pages
As dc11 001
No ratings yet
As dc11 001
1 page
CSC 580 - Chapter 3
No ratings yet
CSC 580 - Chapter 3
35 pages
JD CL - CJ4 - 2023 - 1.33.4
No ratings yet
JD CL - CJ4 - 2023 - 1.33.4
3 pages
85010-0111 - Battery Distribution Unit
No ratings yet
85010-0111 - Battery Distribution Unit
2 pages
Manual Cutting Process
No ratings yet
Manual Cutting Process
13 pages
Enforce User Account Control
100% (1)
Enforce User Account Control
2 pages
Peanut Processing Plant
No ratings yet
Peanut Processing Plant
19 pages
Lớp 5 - Đề cuối HK2
No ratings yet
Lớp 5 - Đề cuối HK2
2 pages
Sample Soil Report
No ratings yet
Sample Soil Report
64 pages
F 24 - 09 (2015) PDF
No ratings yet
F 24 - 09 (2015) PDF
4 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet