[go: up one dir, main page]

0% found this document useful (0 votes)
96 views91 pages

Introduction To Parallel Algorithms and Parallel Program Design

The document discusses key concepts in designing parallel algorithms and parallel program structure, including: 1) Partitioning tasks and data to expose parallelism while balancing workload and minimizing redundant work and communication. 2) Coordinating communication between tasks to allow computation to proceed while considering factors like locality, concurrency, and overlap with computation. 3) Agglomerating tasks to reduce communication costs while maintaining sufficient parallelism and scalability. 4) Mapping tasks to resources considering aspects like load balancing, data and task locality, and dynamic scheduling.

Uploaded by

Ganesh Gajengi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views91 pages

Introduction To Parallel Algorithms and Parallel Program Design

The document discusses key concepts in designing parallel algorithms and parallel program structure, including: 1) Partitioning tasks and data to expose parallelism while balancing workload and minimizing redundant work and communication. 2) Coordinating communication between tasks to allow computation to proceed while considering factors like locality, concurrency, and overlap with computation. 3) Agglomerating tasks to reduce communication costs while maintaining sufficient parallelism and scalability. 4) Mapping tasks to resources considering aspects like load balancing, data and task locality, and dynamic scheduling.

Uploaded by

Ganesh Gajengi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPSX, PDF, TXT or read online on Scribd
You are on page 1/ 91

Introduction to Parallel Algorithms

and Parallel Program Design


Methodological Design
• Partition
– Task/data
decomposition
• Communication
– Task execution
coordination
• Agglomeration
– Evaluation of the
structure
I. Foster, “Designing and Building
• Mapping Parallel Programs,” Addison-Wesley,
– Resource assignment 1995. Book is online, see webpage.

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 2


Partitioning
• Partitioning stage is intended to expose
opportunities for parallel execution
• Focus on defining large number of
small task to yield a fine-grained
decomposition of the problem
• A good partition divides into small pieces
both the computational tasks associated with a problem and
the data on which the tasks operates
• Domain decomposition focuses on computation data
• Functional decomposition focuses on computation tasks
• Mixing domain/functional decomposition is possible

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 3


Domain and Functional Decomposition
• Domain decomposition of 2D / 3D grid

• Functional decomposition of a climate model

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 4


Partitioning Checklist
• Does your partition define at least an order of magnitude
more tasks than there are processors in your target
computer? If not, may loose design flexibility.
• Does your partition avoid redundant computation and
storage requirements? If not, may not be scalable.
• Are tasks of comparable size? If not, it may be hard to
allocate each processor equal amounts of work.
• Does the number of tasks scale with problem size? If not
may not be able to solve larger problems with more
processors
• Have you identified several alternative partitions?

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 5


Communication (Interaction)
• Tasks generated by a partition must
interact to allow the computation
to proceed
– Information flow: data and control
• Types of communication
– Local vs. Global: locality of communication
– Structured vs. Unstructured: communication patterns
– Static vs. Dynamic: determined by runtime conditions
– Synchronous vs. Asynchronous: coordination degree
• Granularity and frequency of communication
– Size of data exchange
• Think of communication as interaction and control
– Applicable to both shared and distributed memory parallelism

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 6


Types of Communication
• Point-to-point
• Group-based
• Hierachical
• Collective

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 7


Communication Design Checklist
• Is the distribution of communications equal?
– Unbalanced communication may limit scalability
• What is the communication locality?
– Wider communication locales are more expensive
• What is the degree of communication concurrency?
– Communication operations may be parallelized
• Is computation associated with different tasks able to
proceed concurrently? Can communication be overlapped
with computation?
– Try to reorder computation and communication to expose
opportunities for parallelism

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 8


Agglomeration
• Move from parallel abstractions
to real implementation
• Revisit partitioning and communication
– View to efficient algorithm execution
• Is it useful to agglomerate?
– What happens when tasks are combined?
• Is it useful to replicate data and/or computation?
• Changes important algorithm and performance ratios
– Surface-to-volume: reduction in communication at the expense of
decreasing parallelism
– Communication/computation: which cost dominates
• Replication may allow reduction in communication
• Maintain flexibility to allow overlap

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 9


Types of Agglomeration
• Element to column

• Element to block
– Better surface to volume

• Task merging

• Task reduction
– Reduces communication

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 10


Agglomeration Design Checklist
• Has increased locality reduced communication
costs?
• Is replicated computation worth it?
• Does data replication compromise scalability?
• Is the computation still balanced?
• Is scalability in problem size still possible?
• Is there still sufficient concurrency?
• Is there room for more agglomeration?
• Fine-grained vs. coarse-grained?
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 11
Mapping
• Specify where each task is to execute
– Less of a concern on shared-memory
systems
• Attempt to minimize execution time
– Place concurrent tasks on different
processors to enhance physical concurrency
– Place communicating tasks on same processor, or on processors close
to each other, to increase locality
– Strategies can conflict!
• Mapping problem is NP-complete
– Use problem classifications and heuristics
• Static and dynamic load balancing

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 12


Mapping Algorithms
• Load balancing (partitioning) algorithms
• Data-based algorithms
– Think of computational load with respect to amount of data
being operated on
– Assign data (i.e., work) in some known manner to balance
– Take into account data interactions
• Task-based (task scheduling) algorithms
– Used when functional decomposition yields many tasks with
weak locality requirements
– Use task assignment to keep processors busy computing
– Consider centralized and decentralize schemes

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 13


Mapping Design Checklist
• Is static mapping too restrictive and non-responsive?
• Is dynamic mapping too costly in overhead?
• Does centralized scheduling lead to bottlenecks?
• Do dynamic load-balancing schemes require too
much coordination to re-balance the load?
• What is the tradeoff of dynamic scheduling
complexity versus performance improvement?
• Are there enough tasks to achieve high levels of
concurrency? If not, processors may idle.

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 14


Types of Parallel Programs
• Flavors of parallelism
– Data parallelism
• all processors do same thing on different data
– Task parallelism
• processors are assigned tasks that do different things
• Parallel execution models
– Data parallel
– Pipelining (Producer-Consumer)
– Task graph
– Work pool
– Master-Worker

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 15


Data Parallel
• Data is decomposed (mapped) onto processors
• Processors performance similar (identical) tasks on data
• Tasks are applied concurrently
• Load balance is obtained through data partitioning
– Equal amounts of work assigned
• Certainly may have interactions between processors
• Data parallelism scalability
– Degree of parallelism tends to increase with problem size
– Makes data parallel algorithms more efficient
• Single Program Multiple Data (SPMD)
– Convenient way to implement data parallel computation
– More associated with distributed memory parallel execution

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 16


Matrix - Vector Multiplication
• Axb=y
• Allocate
j
tasks to rows of A
y[i] = ∑A[i,j]*b[j]

• Dependencies?
• Speedup?
• Computing each
element of y can be
done independently
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 17
Matrix-Vector Multiplication (Limited Tasks)

• Suppose we only have 4 tasks


• Dependencies?
• Speedup?

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 18


Matrix Multiplication
A B C

• AxB=C x =
• A[i,:] • B[:,j] = C[i,j]

• Row partitioning
– N tasks

• Block partitioning
– N*N/B tasks

• Shading shows data


sharing in B matrix

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 19


Granularity of Task and Data Decompositions

• Granularity can be with respect to tasks and data


• Task granularity
– Equivalent to choosing the number of tasks
– Fine-grained decomposition results in large # tasks
– Large-grained decomposition has smaller # tasks
– Translates to data granularity after # tasks chosen
• consider matrix multiplication
• Data granularity
– Think of in terms of amount of data needed in operation
– Relative to data as a whole
– Decomposition decisions based on input, output, input-output, or
intermediate data

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 20


Mesh Allocation to Processors
• Mesh model of Lake Superior
• How to assign mesh elements
to processors

• Distribute onto 8 processors


randomly graph partitioning
for minimum
edge cut

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 21


Pipeline Model
• Stream of data operated on by succession of tasks
Task 1 Task 2 Task 3 Task 4
– Tasks are assigned to processors
data P1 P2 P3 P4
• Consider N data units input
Task 1 Task 2 Task 3 Task 4

 Sequential

 Parallel (each task assigned to a processor)


4 data units 8 data units

4-way parallel, but


for longer time

4-way
Monday,parallel
April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 22
Pipeline Performance
• N data and T tasks
• Each task takes unit time t
• Sequential time = N*T*t
• Parallel pipeline time = start + finish + (N-2T)/T * t
= O(N/T) (for N>>T)
• Try to find a lot of data to pipeline
• Try to divide computation in a lot of pipeline tasks
– More tasks to do (longer pipelines)
– Shorter tasks to do
• Pipeline computation is a special form of
producer-consumer parallelism
– Producer tasks output data input by consumer tasks
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 23
Tasks Graphs
• Computations in any parallel algorithms can be viewed as a task
dependency graph
• Task dependency graphs can be non-trivial
– Pipeline Task 1 Task 2 Task 3 Task 4

– Arbitrary (represents the algorithm dependencies)

Numbers are
time taken to
perform task

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 24


Task Graph Performance
• Determined by the critical path (span)
– Sequence of dependent tasks that takes the longest time

Min time = 27 Min time = 34

– Critical path length bounds parallel execution time

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 25


Task Assignment (Mapping) to Processors

• Given a set of tasks and number of processors


• How to assign tasks to processors?
• Should take dependencies into account
• Task mapping will determine execution time

Total time = ? Total time = ?


Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 26
Task Graphs in Action
• Uintah task graph scheduler
– C-SAFE: Center for Simulation of
Accidental Fires and Explosions,
University of Utah
– Large granularity tasks Task graph for
PDE solver
• PLASMA
– DAG-based parallel
linear algebra
DAG of QR for a
– DAGuE: A generic 4 × 4 tiles matrix on a
distributed DAG engine 2 × 2 grid of
processors.
for HPC

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 27


Bag o’ Tasks Model and Worker Pool
• Set of tasks to be performed
• How do we schedule them?
Processors
– Find independent tasks
– Assign tasks to available processors
• Bag o’ Tasks approach
– Tasks are stored in
Bag o‘
a bag waiting to run tasks
independent tasks
ready to run …
– If all dependencies
are satisified, it is
moved to a ready to run queue
– Scheduler assigns a task to a free processor
• Dynamic approach that is effective for load balancing
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 28
Master-Worker Parallelism
• One or more master processes generate work
• Masters allocate work to worker processes
• Workers idle if have nothing to do
• Workers are mostly stupid and must be told what to do
– Execute independently
– May need to synchronize, but most be told to do so
• Master may become the bottleneck if not careful
• What are the performance factors and expected performance
behavior
– Consider task granularity and asynchrony
– How do they interact?

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 29


Master-Worker Execution Model (Li Li)

Li Li, “Model-based Automatics Performance Diagnosis of Parallel Computations,” Ph.D. thesis, 2007.

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 30


M-W Execution Trace (Li Li)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 31


Search-Based (Exploratory) Decomposition
• 15-puzzle problem
• 15 tiles numbered 1 through 15 placed in 4x4 grid
– Blank tile located somewhere in grid
– Initial configuration is out of order
– Find shortest sequence of moves to put in order

• Sequential search across space of solutions


– May involve some heuristics
Monday, April 17, 2023
BE_Comp_HPC_Algorithm Design... KNH 32
Parallelizing the 15-Puzzle Problem
• Enumerate move choices at each stage
• Assign to processors
• May do pruning
• Wasted work

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 33


Divide-and-Conquer Parallelism
• Break problem up in orderly manner into
smaller, more manageable chunks and solve
• Quicksort example

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 34


Dense Matrix Algorithms
• Great deal of activity in algorithms and software for
solving linear algebra problems
– Solution of linear systems ( Ax = b )
– Least-squares solution of over- or under-determined systems (
min ||Ax-b|| )
– Computation of eigenvalues and eigenvectors ( Ax=x )
– Driven by numerical problem solving in scientific computation
• Solutions involves various forms of matrix computations
• Focus on high-performance matrix algorithms
– Key insight is to maximize computation to communication

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 35


Solving a System of Linear Equations
• Ax=b
a0,0x0 + a0,1x1 + … + a0,n-1xn-1 = b0
a1,0x0 + a1,1x1 + … + a1,n-1xn-1 = b1

An-1,0x0 + an-1,1x1 + … + an-1,n-1xn-1 = bn-1

• Gaussian elimination (classic algorithm)


– Forward elimination to Ux=y (U is upper triangular)
• without or with partial pivoting
– Back substitution to solve for x
– Parallel algorithms based on partitioning of A

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 36


Sequential Gaussian Elimination
1. procedure GAUSSIAN ELIMINATION (A, b, y)
2. Begin
3. for k := 0 to n - 1 do /* Outer loop */
4. begin
5. for j := k + 1 to n - 1 do
6. A[k, j] := A[k, j ]/A[k, k]; /* Division step */
7. y[k] := b[k]/A[k, k];
8. A[k, k] := 1;
9. for i := k + 1 to n - 1 do
10. begin
11. for j := k + 1 to n - 1 do
12. A[i, j] := A[i, j] - A[i, k] x A[k, j ]; /* Elimination step */
13. b[i] := b[i] - A[i, k] x y[k];
14. A[i, k] := 0;
15. endfor; /*Line9*/
16. endfor; /*Line3*/
17. end GAUSSIAN ELIMINATION
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 37
Computation Step in Gaussian Elimination

5x + 3y = 22 x = (22 - 3y) / 5 x = (22 - 3y) / 5


8x + 2y = 13 8(22 - 3y)/5 + 2y = 13 y = (13 - 176/5) / (24/5 + 2)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 38


Rowwise Partitioning on Eight Processes

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 39


Rowwise Partitioning on Eight Processes

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 40


2D Mesh Partitioning on 64 Processes

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 41


Back Substitution to Find Solution
1. procedure BACK SUBSTITUTION (U, x, y)
2. begin
3. for k := n - 1 downto 0 do /* Main loop */
4. begin
5. x[k] := y[k];
6. for i := k - 1 downto 0 do
7. y[i] := y[i] - x[k] xU[i, k];
8. endfor;
9. end BACK SUBSTITUTION

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 42


Dense Linear Algebra (www.netlib.gov)
• Basic Linear Algebra Subroutines (BLAS)
– Level 1 (vector-vector): vectorization
– Level 2 (matrix-vector): vectorization, parallelization
– Level 3 (matrix-matrix): parallelization
• LINPACK (Fortran)
– Linear equations and linear least-squares
• EISPACK (Fortran)
– Eigenvalues and eigenvectors for matrix classes
• LAPACK (Fortran, C) (LINPACK + EISPACK)
– Use BLAS internally
• ScaLAPACK (Fortran, C, MPI) (scalable LAPACK)
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 43
Numerical Libraries
• PETSc
– Data structures / routines for partial differential equations
– MPI based
• SuperLU
– Large sparse nonsymmetric linear systems
• Hypre
– Large sparse linear systems
• TAO
– Toolkit for Advanced Optimization
• DOE ACTS
– Advanced CompuTational Software
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 44
Sorting Algorithms
• Task of arranging unordered collection into order
• Permutation of a sequence of elements
• Internal versus external sorting
– External sorting uses auxiliary storage
• Comparison-based
– Compare pairs of elements and exchange
– O(n log n)
• Noncomparison-based
– Use known properties of elements
– O(n)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 45


Sorting on Parallel Computers
• Where are the elements stored?
– Need to be distributed across processes
– Sorted order will be with respect to process order
• How are comparisons performed?
– One element per process
• compare-exchange
• interprocess communication will dominate execution time
– More than one element per process
• compare-split
• Sorting networks
– Based on comparison network model
• Contrast with shared memory sorting algorithms
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 46
Single vs. Multi Element Comparision
• One element per processor

• Multiple elements per processor

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 47


Sorting Networks
• Networks to sort n elements in less than O(n log n)
• Key component in network is a comparator
– Increasing or decreasing comparator

• Comparators connected in parallel and permute elements

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 48


Sorting Network Design
• Multiple comparator stages (# stages, # comparators)
• Connected together by interconnection network
• Output of last stage is the sorted list
• O(log2n) sorting time
• Convert any
sorting network
to sequential
algorithm

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 49


Bitonic Sort
• Create a bitonic sequence then sort the sequence
• Bitonic sequence
– sequence of elements <a0, a1, …, an-1>
– <a0, a1, …, ai> is monotonically increasing
– <ai, ai+1, …, an-1> is monotonically decreasing
• Sorting using bitonic splits is called bitonic merge
• Bitonic merge network is a network of comparators
– Implement bitonic merge
• Bitonic sequence is formed from unordered sequence
– Bitonic sort creates a bitonic sequence
– Start with sequence of size two (default bitonic)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 50


Bitonic Sort Network
Unordered sequence Bitonic sequence

decrease

increase

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 51


Bitonic Merge Network
Bitonic sequence Sorted sequence

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 52


Parallel Bitonic Sort on a Hypercube
1. procedure BITONIC SORT(label, d)
2. begin
3. for i := 0 to d - 1 do
4. for j := i downto 0 do
5. if (i + 1)st bit of label = j th bit of label then
6. comp exchange max(j);
7. else
8. comp exchange min(j);
9. end BITONIC SORT
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 53
Parallel Bitonic Sort on a Hypercube

Last stage

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 54


Bubble Sort and Variants
• Can easily parallelize sorting algorithms of O(n2)
• Bubble sort compares and exchanges adjacent
elements
– O(n) each pass
– O(n) passes
– Available parallelism?
• Odd-even transposition sort
– Compares and exchanges odd and even pairs
– After n phases, elements are sorted
– Available parallelism?
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 55
Odd-Even Transposition Sort

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 56


Parallel Odd-Even Transposition Sort
1. procedure ODD-EVEN PAR(n)
2. begin
3. id := process’s label
4. for i := 1 to n do
5. begin
6. if i is odd then
7. if id is odd then
8. compare-exchange min(id + 1);
9. else
10. compare-exchange max(id - 1);
11. if i is even then
12. if id is even then
13. compare-exchange min(id + 1);
14. else
15. compare-exchange max(id - 1);
16. end for
17. end ODD-EVEN PAR
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 57
Quicksort
• Quicksort has average complexity of O(n log n)
• Divide-and-conquer algorithm
– Divide into subsequences where every element in
first is less than or equal to every element in the
second
– Pivot is used to split the sequence
– Conquer step recursively applies quicksort
algorithm
• Available parallelism?
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 58
Sequential Quicksort
1. procedure QUICKSORT (A, q, r )
2. begin
3. if q < r then
4. begin
5. x := A[q];
6. s := q;
7. for i := q + 1 to r do
8. if A[i] ≤ x then
9. begin
10. s := s + 1;
11. swap(A[s], A[i ]);
12. end if
13. swap(A[q], A[s]);
14. QUICKSORT (A, q, s);
15. QUICKSORT (A, s + 1, r );
16. end if
17. end QUICKSORT
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 59
Parallel Shared Address Space Quicksort

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 60


Parallel Shared Address Space Quicksort

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 61


Bucket Sort and Sample Sort
• Bucket sort is popular when elements (values) are
uniformly distributed over an interval
– Create m buckets and place elements in appropriate bucket
– O(n log(n/m))
– If m=n, can use value as index to achieve O(n) time
• Sample sort is used when uniformly distributed
assumption is not true
– Distributed to m buckets and sort each with quicksort
– Draw sample of size s
– Sort samples and choose m-1 elements to be splitters
– Split into m buckets and proceed with bucket sort

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 62


Parallel Sample Sort

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 63


Graph Algorithms
• Graph theory important in computer science
• Many complex problems are graph problems

• G = (V, E)
– V finite set of points called vertices
– E finite set of edges
– e  E is an pair (u,v), where u,v  V
– Unordered and ordered graphs

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 64


Graph Terminology
• Vertex adjacency if (u,v) is an edge
• Path from u to v if there is an edge sequence starting at u and
ending at v
• If there exists a path, v is reachable from u
• A graph is connected if all pairs of vertices are connected by a
path
• A weighted graph associates weights with each edge
• Adjacency matrix is an n x n array A such that
– Ai,j = 1 if (vi,vj)  E; 0 otherwise
– Can be modified for weighted graphs (∞ is no edge)
– Can represent as adjacency lists

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 65


Graph Representations
• Adjacency matrix

• Adjacency list

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 66


Minimum Spanning Tree
• A spanning tree of an undirected graph G is a subgraph of
G that is a tree containing all the vertices of G
• The minimum spanning tree (MST) for a weighted
undirected graph is a spanning tree with minimum weight
• Prim’s algorithm can be used
– Greedy algorithm
– Selects an arbitrary starting vertex
– Chooses new vertex guaranteed to be in MST
– O(n2)
– Prim’s algorithm is iterative

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 67


Prim’s Minimum Spanning Tree Algorithm
1. procedure PRIM MST(V, E,w, r )
2. begin
3. VT := {r };
4. d[r] := 0;
5. for all v   (V - VT ) do
6. if edge (r, v) exists set d[v] := w(r, v);
7. else set d[v] :=∞;
8. while VT  V do
9. begin
10. find a vertex u such that d[u] := min{d[v]|v   (V - VT )};
11. VT := VT  {u};
12. for all v  (V - VT ) do
13. d[v] := min{d[v],w(u, v)}; *
14. endwhile
15. end PRIM MST

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 68


Example: Prim’s MST Algorithm

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 69


Example: Prim’s MST Algorithm

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 70


Parallel Formulation of Prim’s Algorithm
• Difficult to perform different iterations of the while loop in
parallel because d[v] may change each time
• Can parallelize each iteration though
• Partition vertices into p subsets Vi, i=0,…,p-1
• Each process Pi computes
di[u]=min{di[v] | v  (V-VT)  Vi}
• Global minimum is obtained using all-to-one reduction
• New vertex is added to VT and broadcast to all processes
• New values of d[v] are computed for local vertex
• O(n2/p) + O(n log p) (computation + communication)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 71


Partitioning in Prim’s Algorithm

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 72


Single-Source Shortest Paths
• Find shortest path from a vertex v to all other vertices
• The shortest path in a weighted graph is the edge with the
minimum weight
• Weights may represent time, cost, loss, or any other
quantity that accumulates additively along a path
• Dijkstra’s algorithm finds shortest paths from vertex s
– Similar to Prim’s MST algorithm
• MST with vertex v as starting vertex
– Incrementally finds shortest paths in greedy manner
– Keep track of minimum cost to reach a vertex from s
– O(n2)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 73


Dijkstra’s Single-Source Shortest Path

1. procedure DIJKSTRA SINGLE SOURCE SP(V, E,w, s)


2. begin
3. VT := {s};
4. for all v   (V - VT ) do
5. if (s, v) exists set l[v] := w(s, v);
6. else set l[v] :=∞;
7. while VT  V do
8. begin
9. find a vertex u such that l[u] := min{l[v]|v   (V - VT )};
10. VT := VT  {u};
11. for all v   (V - VT ) do
12. l[v] := min{l[v], l[u] + w(u, v)};
13. endwhile
14. end DIJKSTRA SINGLE SOURCE SP

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 74


Parallel Formulation of Dijkstra’s Algorithm

• Very similar to Prim’s MST parallel formulation


• Use 1D block mapping as before
• All processes perform computation and
communication similar to that performed in Prim’s
algorithm
• Parallel performance is the same
– O(n2/p) + O(n log p)
– Scalability
• O(n2) is the sequential time
• O(n2) / [O(n2/p) + O(n log p)]

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 75


All Pairs Shortest Path
• Find the shortest path between all pairs of vertices
• Outcome is a n x n matrix D={di,j} such that di,j is the cost
of the shortest path from vertex vi to vertex vj
• Dijsktra’s algorithm
– Execute single-source algorithm on each process
– O(n3)
– Source-partitioned formulation (use sequential algorithm)
– Source-parallel formulation (use parallel algorithm)
• Floyd’s algorithm
– Builds up distance matrix from the bottom up

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 76


Floyd’s All-Pairs Shortest Paths Algorithm

1. procedure FLOYD ALL PAIRS SP(A)


2. begin
3. D(0) = A;
4. for k := 1 to n do
5. for i := 1 to n do
6. for j := 1 to n do
7. d(k)i, j := min d(k-1)i, j , d(k-1)i,k + d(k-1)k, j ;
8. end FLOYD ALL PAIRS SP
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 77
Parallel Floyd’s Algorithm
1. procedure FLOYD ALL PAIRS PARALLEL
(A)
2. begin
3. D(0) = A;
4. for k := 1 to n do
5. forall Pi,j, where i, j ≤ n, do in parallel
6. d(k)i, j := min d(k-1)i, j , d(k-1)i,k + d(k-1)k, j ;
7. end FLOYD ALL PAIRS PARALLEL
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 78
Parallel Graph Algorithm Library – Boost

• Parallel Boost Graph Library


– Andrew Lumsdaine, Indiana University
– Generic C++ library for high-performance parallel
and distributed graph computation
– Builds on the Boost Graph Library (BGL)
• offers similar data structures, algorithms, and syntax
– Targets very large graphs (millions of nodes)
– Distributed-memory parallel processing on
clusters

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 79


Original BGL: Algorithms
• Searches (breadth-first, • Max-flow (Edmonds-Karp, push-
depth-first, A*) relabel)
• Single-source shortest paths • Sparse matrix ordering (Cuthill-
(Dijkstra, Bellman-Ford, DAG) McKee, King, Sloan, minimum
degree)
• All-pairs shortest paths
• Layout (Kamada-Kawai,
(Johnson, Floyd-Warshall) Fruchterman-Reingold, Gursoy-
• Minimum spanning tree Atun)
(Kruskal, Prim) • Betweenness centrality
• Components (connected, • PageRank
strongly connected, • Isomorphism
biconnected) • Vertex coloring
• Maximum cardinality • Transitive closure
matching • Dominator tree

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 80


Original BGL Summary
• Original BGL is large, stable, efficient
– Lots of algorithms, graph types
– Peer-reviewed code with many users, nightly
regression testing, and so on
– Performance comparable to FORTRAN.
• Who should use the BGL?
– Programmers comfortable with C++
– Users with graph sizes from tens of vertices to
millions of vertices
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 81
Parallel BGL
• A version of C++ BGL for computational clusters
– Distributed memory for huge graphs
– Parallel processing for improved performance
• An active research project
• Closely related to the original BGL
– Parallelizing BGL programs should be “easy”
A simple, directed graph… distributed across 3 processors

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 82


Parallel Graph Algorithms
• Connected components
Breadth-first search
•• Strongly connected
Eager Dijkstra’s components
single-source shortest paths
•• Biconnected
Crauser et al.components
single-source shortest paths
•• PageRank
Depth-first search
• Graph coloring
• Minimum spanning tree (Boruvka, Dehne &
• Fruchterman-Reingold layout
Götz)
• Max-flow (Dinic’s)

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 83


Big-Data and Map-Reduce
• Big-data deals with processing large data sets
• Nature of data processing problem makes it amenable
to parallelism
– Looking for features in the data
– Extracting certain characteristics
– Analyzing properties with complex data mining algorithms
• Data size makes it opportunistic for partitioning into
large # of sub-sets and processing these in parallel
• We need new algorithms, data structures, and
programming models to deal with problems
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 84
A Simple Big-Data Problem
• Consider a large data collection of text documents
• Suppose we want to find how often a particular
word occurs and determine a probability
distribution for all word occurrences
Sequential algorithm web 2
Count words and weed 1
update statistics
Get next green 2
document
Data Find and
sun 1

collection count words moon 1


land 1

Generate part 1
probability
distributions
Check if more
documents
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 85
Parallelization Approach
• Map: partition the data collection into subsets of
documents and process each subset in parallel
• Reduce: assemble the partial frequency tables to
derive final probability distribution web 2
Parallel algorithm web
webweed2
2
1
weed 1
weedgreen1 2
Get next Count words and green 2
sun 2
green 1
document update statistics sun 1
Data Find and
sun moon1
moon 1
1
land 1 1
collection count words moon
land 1
landpart 1 1
part 1
part 1

Check if more Generate


documents probability
distributions

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 86


Parallelization Approach
• Map: partition the data collection into subsets of
documents and process each subset in parallel
• Reduce: assemble the partial frequency tables to
derive final probability distribution
Parallel algorithm web 2
weed 1
Get next Count words and green 2
document update statistics
Data Find and
sun 1
moon 1
collection count words land 1
part 1

Check if more Generate


documents probability
distributions

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 87


Actually, it is not easy to parallel….
Different programming models
Fundamental issues Message Passing Shared Memory
Scheduling, data distribution, synchronization, inter-
process communication, robustness, fault tolerance,

Architectural issues
Flynn’s taxonomy (SIMD, MIMD, etc.), network
topology, bisection bandwidth, cache coherence, …
Different programming constructs
Mutexes, conditional variables, barriers, …
masters/slaves, producers/consumers, work queues,. …

Common problems
Livelock, deadlock, data starvation, priority inversion,
…dining philosophers, sleeping barbers, cigarette
smokers, …

Actually, Programmer’s Nightmare….

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 88


Map-Reduce Parallel Programming
• Become an important distributed parallel
programming paradigm for large-scale applications
– Also applies to shared-memory parallelism
– Becomes one of the core technologies powering big IT
companies, like Google, IBM, Yahoo and Facebook.
• Framework runs on a cluster of machines and
automatically partitions jobs into number of small
tasks and processes them in parallel
• Can capture in combining Map and Reduce parallel
patterns
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 89
Map-Reduce Example
web 1
MAP: Input data  <key, value> pair weed 1

green 1 web 1
sun 1 weed 1
moon 1 green 1
land 1 sun1 1
web
part 1 moon 1

Data Map web


weed
1
1
land
green 1web 1 1
green 1
Collection: Split the data to web … 1
sun
1
1weed
part 1 1

moon web1green 1 1
split1 Supply multiple weedKEY 1 VALUE
land 1sun
green 1 1
green 1 … 1moon 1 1
processors sun 1
part

web KEY1land 1
VALUE
moon 1
Data green 1part 1

Map land 1
… 1web 1
Collection: part 1
KEY green
VALUE 1

split 2 web 1
……


… 1
green 1 KEY VALUE
… 1

KEY VALUE
Data
Collection:
split n
Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 90
MapReduce
MAP: Input data  <key, value> pair
REDUCE: <key, value> pair  <result>

Reduce
Data Map
Collection:
Split the data to
split1
Supply multiple
processors
Data Reduce
Collection: Map
split 2
……


Data
Collection: Reduce
Map
split n

Monday, April 17, 2023 BE_Comp_HPC_Algorithm Design... KNH 91

You might also like