Pda 1
Pda 1
Algorithms
Course for Undergraduate Students in the 3rd year
(Major in Computer Science)
Instructor:
Mihai L. Mocanu, Ph.D., Professor
E-mail: mihai.mocanu@edu.ucv.ro
Human-
Numerical and Computer
Symbolic Programming Interaction
Computation Languages ACM & IEEE
curriculum
recommendations
Course objectives
To provide you with an introduction and overview to the
computational aspects of parallel and distributed computing
To introduce several important parallel computing models
To study the typical models for distributed computing.
To make basic concepts of P & D computing both accessible and
ready for immediate use
Broadly understand P&D architectures
Become familiar with typical software/ programming approaches
Be able to quickly adapt to any programming environment
Study the main classes of P&D algorithms
Learn the jargon … understanding what people are talking about
Be able to apply this knowledge
Course objectives (cont.)
More specific, with the aim of drastically flattening the learning
curve in a parallel and/or distributed environment:
Study the development methods of efficient P&D algorithms using:
Data partitioning and functional partitioning
Parallelism exploitation for different process topologies
Efficient overcoming of communication latencies,
Load balancing
Termination detection
Failure tolerance etc.
Learn the conceptual models for the specification and verification
of P&D algorithms
Understand the complexity models of P&D algorithms
Learning Outcomes
On successfully completing this course you should understand a
number of different models of P&D computing and the basic
techniques for designing algorithms in these models.
Development techniques for parallel algorithms - examples from
the main classes: prefix algorithms, search, sort, selection, matrix
processing, graph algorithms, lists, trees
Main development techniques of classes of distributed algorithms:
wave algorithms, mutual exclusion, topology establishment,
termination, fault tolerance, leader election, genetic algorithms
Textbooks and other references
Vipin Kumar, Ananth Grama, Anshul Gupta, George Kyrypis - Introduction to Parallel
Computing Pearson Ed. 2003, (2nd Edition) or Benjamin/Cummings 1994, (1st Edition)
http://srmcse.weebly.com/uploads/8/9/0/9/8909020/introduction_to_parallel_computing_second_editi
on-ananth_grama..pdf , or https://docs.google.com/...
Behrooz Parhami - Introduction to Parallel Processing: Algorithms and Architectures,
Kluwer Academic Publ, 2002
Dan Grigoras – Parallel Computing. From Systems to Applications, Computer Libris
Agora, 2000, ISBN 973-97534-6-9
Mihai Mocanu – Algorithms and Languages for Parallel Processing, Publ. University of
Craiova, updated yearly for 20+ years on my home page
George Coulouris, Jean Dollimore, Tim Kindberg and Gordon Blair, Distributed Systems
Concepts and Design (5th ed.), Addison Wesley, 2011
https://repository.dinus.ac.id/docs/ajar/George-Coulouris-Distributed-Systems-Concepts-and-Design-5th-Edition.pdf
Why?
It is better to be aware of the physical and economical constraints and tradeoffs
of the parallel system you are designing for, not to be sorry later.
Topics (continued)
Why?
These are fundamental issues that appear/apply to every parallel program.
You really should learn this stuff by hearth.
Topics (cont.)
Why?
These are fundamental primitives you would often use and you should known
them well: Not only what they do, but also how much do they cost and when and
how to use them.
Going through details of implementation allows us to see how are the principles
from the previous topic applied to relatively simple problems.
Topics (cont.)
Why?
Parallel programming is done to increase performance.
Debugging and profiling is extremely difficult in parallel setting, so it is better
to understand from the beginning what performance to expect from a given
parallel program and, more generally, how to design parallel programs with
low execution time. It is also important to know the limits of what can be done
and what not.
Topics (cont.)
6. Parallel Dense Matrix Algorithms
matrix vector multiplication
matrix matrix multiplication
solving systems of linear equations
7. Parallel Graph Algorithms
minimum spanning tree
single-source shortest paths
all-pairs shortest paths
connected components
algorithms for sparse graphs
Why?
Classical problems with lots of applications, many interesting and useful
techniques exposed.
Topics (cont.)
8. Parallel Sorting
odd-even transpositions sort
sorting networks, bitonic sort
parallel quicksort, bucket and sample sort
9. Search Algorithms for Discrete Optimization Problems
search overhead factor, speedup anomalies
parallel depth-first search
sorting and searching on PRAMs
10. Geometric Algorithms
convex hulls, closest pair of points
Why?
As before, plus shows many examples of hard-to-parallelize problems.
Topics (cont.)
11. Concepts of distributed computation
termination detection
failure tolerance
distributed network topologies
12. Formal Models
state transitions and executions
asynchronous and synchronous models
causality constraints and the computation theorem
logical clocks (Lamport) and vector clocks (Mattern-Fidge)
Why?
It is important and useful to understand their social importance for Internet,
WWW, small devices (mobiles, sensors) and their technical importance
that aims to improve scalability, reliability through inherent distribution
Topics (cont.)
13. Distributed abstractions
event-based component model
specification of services
safety and liveness
node and channel failure models and links
timing assumptions
14. Failure detectors
classes of detectors
leader elections and reductions
quorums; byzantine quorums and leader election
Why?
Making use of abstractions, like timing assumptions and ways to encapsulate
them (failure detectors), proves to be very useful.
Topics (cont.)
15. Broadcast abstractions
(lazy/ eager/ uniform/ fail-silent) reliable broadcast
causal broadcast, vector-clock algorithms
performance of algorithms
16. Consistency
replicated shared memory
linearizable registers
linearizable single write multiple reader
multiple writers
17. Consensus
control-oriented vs. event-based, uniform consensus
Paxos and multi-Paxos consensus
Topics (cont.)
18. Reconfigurable Replicated State Machines
total order broadcasting
robust distributed networks
19. Random Walks
introduction to Markov processes
random walks (hitting time, cover time (s.t)-connectivity)
20. Time and Clocks in Distributed Systems
time lease
clock drift and the complete algorithm
…
Grading (tentative)
20% practical laboratory assignments/ projects (L)
20% periodic evaluation through homeworks/exercises (H)
20% final test quizz (T)
40% final exam (E)
You have to get at least 50% on final laboratory evaluation (L) in
order to be allowed to sustain the final exam in the next session.
You have to get at least 50% on the final exam (E) to pass and to
obtain a mark greater than 5. All the grades obtained go with the
specified weight into the computation of the final mark.
If you have problems with setting up your working environment and/or
running your programs, ask the TA for help/advice. He is there to help you
with that. Use him, but do not abuse him with normal programming bugs
Assignments and evaluations
computational speed)
A few core problems in Distributed Computing
Background
Parallel Computer: a computer with many processors that
are closely connected
frequently, all processors share the same memory
they also communicate by accessing this shared memory
Examples of parallel computers include the multicore
processors found in many computers today (even cheap
ones), as well as many graphics processing units (GPUs)
Parallel Computing: “using more than one computer, or a
computer with more than one processor, to solve a task ”
The parallel programming paradigm is not new: it has been
around for more than 50 years! Motives: faster computation,
larger amount of memory available, etc.
“... There is therefore nothing new in the idea of parallel
programming, but its application to computers. The author
cannot believe that there will be any insuperable difficulty in
extending it to computers. It is not to be expected that the
necessary programming techniques will be worked out
overnight. Much experimenting remains to be done. After all,
the techniques that are commonly used in programming today
were only won at the cost of considerable toil several years
ago. In fact the advent of parallel programming may do
something to revive the pioneering spirit in programming
which seems at the present to be degenerating into a rather
dull and routine occupation ...”
Gill, S. (1958), “Parallel Programming,” The Computer Journal, vol. 1, April 1958, pp. 2-10.
Background
Distributed Computer: one in which the processors are less
strongly connected –“a set of nodes connected by a network,
which appear to its users as a single coherent system”
A typical distributed system consists of many independent
computers, attached via network connections
Such an arrangement is called a cluster
In a distributed system, each processor has its own memory.
This precludes using shared memory for communicating
Processors instead communicate by sending messages
Although message passing is slower than shared memory, it
scales better for many processors, and it is cheaper
PARALLEL vs. Distributed Computing
is usually) different
Maximum Speedup
Is usually p with p processors (linear speedup)
Speedup factor is given by:
ts p
S(p)
fts (1 f )ts /p 1 (p 1)f
This equation is known as Amdahl’s law
Remark: Possible but unusual to get superlinear speedup
(greater than p) but due to a specific reason such as:
Extra memory in multiprocessor system
Nondeterministic algorithm
Maximum Speedup
Amdahl’s law
ts
fts (1 - f)ts
(b) Multiple
processors
p processors
tp (1 - f)ts /p
Speedup against number of processors
Start Time
ts
t s/p
Sub-space t
search
xts/p
Solution found
x indeterminate
(b) Searching each sub-space in parallel ts
x t
Speedup is given by: p
S ( p)
t
Worst case for sequential search when solution found in last
sub-space search. Then the parallel version offers the greatest
benefit, i.e.
Solution found
Measures of Performance
OR
A moderate to very large number of simpler processors
So:
When combined with a high-bandwidth, but logically simple,
Data analysis
Examples:
Knowing the sequence of nucleotides in DNA is, by
Execution Speed
With >102 times increase of (floating point) execution speed every
computing platforms
1985 – 1990 : in spite of an average 20x increase in processor
performance, the communication speed kept constant
Parallel Computing – How exectime decreased
Time(s) per
fp instruction
Motto: “I think there is a world market for maybe five
computers” (Thomas Watson, IBM Chairman, 1943)
Towards Parallel Computing – The 5 ERAs
More on the historical perspective
Parallel computing has been here since the early days of computing.
Traditionally: custom HW, custom SW, high prices
The “doom” of the Moore law:
- custom HW has hard time catching up with the commodity processors
Current trend: use commodity HW components, standardize SW
Parallelism sneaking into commodity computers:
• Instruction Level Parallelism - wide issue, pipelining, OOO
• Data Level Parallelism – 3DNow, Altivec
• Thread Level Parallelism – Hyper-threading in Pentium IV
Transistor budgets allow for multiple processor cores on a chip.
More on historical perspective (cont.)
Most applications would benefit from being parallelized and
executed on a parallel computer.
• even PC applications, especially the most demanding ones – games,
multimedia
Chicken & Egg Problem:
1. Why build parallel computers when the applications are sequential?
2. Why parallelize applications when there are no parallel commodity
computers?
Answers:
1. What else to do with all those transistors?
2. Applications already are a bit parallel (wide issue, multimedia
instructions, hyper-threading), and this bit is growing.
Core Problems in Distributed Computing
• What types of problems are there?
• An example: Generals’ Problem
Two generals need to coordinate an attack
1. Must agree on time to attack
2. They’ll win only if they attack simultaneously
3. Communicate through messengers
4. Messengers may be killed on their way
Lets try to solve it for general g1 and g2
step 1: g1 sends time of attack to g2
Problem: how to ensure g2 received msg?
step 2 (Solution): let g2 ack receipt of msg
Problem: how to ensure g1 received ack
step 3 (Solution): let g1 ack the receipt of the ack…
…
This problem seems (is!) really impossible to solve!
Consensus
Two nodes need to agree on a value
Communicate by messages using an unreliable channel
Agreement is a core problem… agreeing on a number = consensus
The Consensus problem
All nodes propose a value
Some nodes might crash & stop responding
The algorithm must ensure:
All correct nodes eventually decide
Every node decides the same
Only decide on proposed values
Broadcast
Atomic Broadcast
A node broadcasts a message
If sender correct, all correct nodes deliver msg
All correct nodes deliver same messages
Messages delivered in the same order
Given Atomic broadcast
Can use it to solve Consensus
Every node broadcasts its proposal
Decide on the first received proposal
Messages received same order
All nodes will decide the same
Given Consensus
Can use it to solve Atomic broadcast
Atomic Broadcast equivalent to Consensus