[go: up one dir, main page]

0% found this document useful (0 votes)
9 views55 pages

Lecture 08

The lecture on 'Introduction to Cloud Computing' covers the principles of parallel computing, including design considerations, limits, performance analysis, and examples of problems solvable by parallel computing. Key topics include automatic vs. manual parallelization, communication between tasks, synchronization methods, data dependencies, load balancing, and granularity of tasks. The lecture emphasizes the importance of understanding these concepts to optimize parallel computing applications.

Uploaded by

Hossein Rezaei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views55 pages

Lecture 08

The lecture on 'Introduction to Cloud Computing' covers the principles of parallel computing, including design considerations, limits, performance analysis, and examples of problems solvable by parallel computing. Key topics include automatic vs. manual parallelization, communication between tasks, synchronization methods, data dependencies, load balancing, and granularity of tasks. The lecture emphasizes the importance of understanding these concepts to optimize parallel computing applications.

Uploaded by

Hossein Rezaei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Carnegie Mellon

Introduction to Cloud Computing


Parallel Processing II
15‐319, spring 2010
8th Lecture, Feb 4th

Majd F. Sakr

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Lecture Motivation
 Limits and performance considerations
 Which parallel architecture/model is best for your
application

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Lecture Outline
 Parallel Computing Design Considerations
 Limits and Costs of Parallel Computing
 Parallel Computing Performance Analysis
 Examples of Problems Solved By Parallel Computing

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

How to Parallelize
 Automatic vs. Manual Parallelization

 Design Considerations
1. Can the Problem be parallelized?
2. Program’s hotspots & bottlenecks?
3. Partitioning
4. Communications
5. Synchronization
6. Data Dependencies
7. Load Balancing
8. Granularity
9. Input/Output

4
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

Automatic VS Manual Parallelism


 Mostly, developing parallel programs has been manual. This is
complex, time consuming, and error‐prone process.

 Parallelizing compiler or pre‐processor is used to parallelize


serial code. This complier usually works in two different ways:
 Fully Automatic:
The compiler analyzes the source code and specifies parts that could be
parallelized.

 Programmer Directed:
The programmer uses compiler flags to explicitly tell the compiler how to
parallelize the code.

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

(1) Can the Problem be Parallelized?

 Parallelism Inhibitors:
 Control vs. data dependencies
 Examples:
 Parallelizable Problem: Multiply each element
of the array by 2
 Non‐parallelizable Problem: Fibonacci sequence
 Handling Data Dependencies

 Parallelism Slow‐down:
 Communications bottleneck

6
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

(2) Hotspots & Bottlenecks


 Hotspots:
 What are they? Account for most of CPU usage
 How to define them in the program? Profiling &
Performance Analysis
 Parallelism focus should be on these spots
Source: http://scavenging.wordpress.com/2009/05/

 Bottlenecks:
 What are they? slow areas
 Can we redesign the algorithm to reduce
/eliminate bottlenecks?

Source: http://zubinmehta.files.wordpress.com/

7
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

(3) Partitioning/Decomposition
 Dividing the problem into chunks/parts of work that can
be distributed to multiple tasks.

 Best Partitioning happens where there is minimum I/O &


communication

 Ways to Partition?
 Domain Decomposition
 Functional Decomposition

http://tinypic.com/view.php?pic=a29ah0&s=3
8
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

(3) Partitioning/Decomposition
Task 1 Task 5
 Functional
Decomposition F1 Loop

 The focus is on the F4 Matrix Mult


computation to be
performed, not on the data Problem
Matrix Mult F4 Loop
given to the problem. Intersection F2
Set F1 Loop F3 F5
 The problem is partitioned
best on the work that must
be done. Each task Loop
computes a part of the F2
F5 F3
overall work.
Task 2 Task 4 Task 3

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

(3) Partitioning/Decomposition
 Domain Decomposition
 The data given to the problem is divided into chunks.
 Each chunk is given to a task that performs the operation on it.
Tasks run in parallel.
A: Array of 50 elements

Problem A0 … A9 A10 … A19 A20 … A29 A30 … A39 A40 … A49


Data Set

A0 … A9 A10 … A19 A20 … A29 A30 … A39 A40 … A49

Task 1 Task 2 Task 3 Task 4 Task 5

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

(4) Communication
 Do we need inter‐task communication?

 Inter‐task Communication Considerations:


1. Costs
2. Visibility
3. Synchronous vs. Asynchronous
4. Scope
5. Efficiency

11
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

(4) Communication
 When is task communication needed?

 Embarrassingly parallel problems


Problem is simple that it can be partitioned into tasks with no need
for the tasks to share any data.
 Loosely coupled

 Complex problems
Problem is complex, it can be partitioned into tasks, but tasks need
to share data with each other to complete the computations.
 Tightly coupled

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

(4) Communication: Considerations


 Inter‐task Communication Considerations:

1. Costs:

 Time Cost:
– Time waiting to synchronize
– Bandwidth saturation
– Bandwidth VS Latency

 Resources Costs:
– Machine cycles and resources

 Both (Time & Resources Cost):


– Overhead & Complexity

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

(4) Communication: Considerations


1) Costs
 Time Cost
 Time waiting to synchronize
Communications synchronization between tasks. Tasks
spend some of their time waiting for others instead of working. http://theragblog.blogspot.
com/2009/04/

 Bandwidth saturation
Competing communication traffic fill the network bandwidth.

 Bandwidth VS Latency
 Latency is the time it takes to send a minimum size message.
 Bandwidth is the amount of data that can be transferred per time unit.
 Small messages experience both delays. It’s better to package multiple
small messages in one big message.

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

(4) Communication: Considerations


1) Costs
 Resources Cost
 Machine cycles and resources
Machine cycles and resources used to package and
transmit data. They are supposed to be used for
computation.
http://www.mewan.net/curriculum/pshe/

 Time and Resources Cost


 Overhead & Complexity
Inter‐task communication involve overhead in terms of time spent and
resource consumed.

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

(4) Communication: Considerations

2) Visibility
 Task communication is more visible in
some models (ex: Message Passing
Model), and is under the programmer’s
control.

 In other models (ex: Data Parallel Model),


communication is often done
transparently with no control of the
programmer. Source: http://www.tqnyc.org/2009/00767/Weather%20vocabulary.html

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

(4) Communication: Considerations


3) Synchronous vs. asynchronous
communications
Synchronous communications Asynchronous communications

 Require some type of "handshaking"  Allow tasks to communicate data


between tasks that are sharing data. independently from the work they are
This could be done implicitly or doing.
explicitly.
 Non‐blocking communications.
 Blocking communications:
Some work must be held until the  Advantage: Interleaving computation
communications are done. with communication.

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

(4) Communication: Considerations


4) Scope
 Communication Scope:
Which tasks needs to communicate
with each other.
 Point‐to‐point : two tasks are
communicating, one acts as the
sender/producer of data, the other acts as
the receiver/consumer.
http://en.wikipedia.org/wiki/File:4x_rifle_scope.jpg

 Collective: Data is communicated between


more than two tasks. They are members in
a common group, or collective.

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

(4) Communication: Considerations

5) Efficiency
 Efficiency varies based on the applications requirements and
the programmer’s decisions. Programmer must decide:
 Which factors should have more impact on task communication.
 Which implementation to use for the chosen model.
 What type of communication needs to be used.
 Network(s) type and number.

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

(5) Synchronization (1/4)


 Process Synchronization:
Reaching an agreement between simultaneous
processes/tasks regarding a sequence of in‐order steps to
complete an action
 Barriers
 Locks/ Semaphores
 Synchronous Communication Operations

20
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

(5) Synchronization (2/4)


 Barriers
 A point at which a task must stop, and can not proceed until all tasks are synchronized.
 Mostly, all tasks are involved.
 Each task keeps performing its work until reaching a barrier point. Then, it stops and
keeps waiting for the last task to reach the barrier.
 When last task reaches
the barrier, all tasks are synchronized.
 From this point, tasks continues their work.

Tasks Time Barrier Task


Task A continuation
point

Task B Tasks
reaching
the barrier
Task C at different
times
Task D
21
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

(5) Synchronization (3/4)


 Locks/ Semaphores Process A
 Any number of tasks can be involved x=a+b
 Used to protect access to global data or y=c+d CPU
a section of code. Only one task at a time
z=x+y
may access the lock/semaphore.
 The first task accesses the lock sets it to c
be locked and releases it when it’s done Shred
a CPU
with it. Memory
 When other tasks try to access the lock x
y f
they fail until the task that owns the lock
l=b-a
releases it. b z
m d m=2+d
 Can be blocking or non‐blocking l

Process B Process C
X is blocked by process A.
x=a*2
Process B can not access it
CPU f=5
until process A unblocks it!
z=x-f

22
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

(5) Synchronization (4/4)


 Synchronous communication operations
 Involves only tasks executing a communication operation
 Coordination is required between the task that is performing an operation and the other
tasks performing the communication
 Require some type of "handshaking" between tasks that are sharing data. This could be
done implicitly or explicitly.
 Blocking communications:
Some work must be held until the communications are done.

Task A Task B
Task A Task B
Syn
Syn
ck
Syn, A Ack
Three- Way Two Way
Handshake Ack Handshake

Communication
Communication …

23
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

(6) Data Dependencies


B=3
 The order of program statement C=B

execution effects the results of the A=C+1


B=3
program . B=7
C=B
A=C+1
 Multiple use of data stored in the B=7
same location by multiple tasks.
x=a+b PC
RF
 Dependencies make one of the y=c+d IR
CU
primary inhibitors to parallelism. z=x+y

RF PC
Shred c IR
x CU
a Memory
f
b y l l=b-a
z
m=2+d
m d

24
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

(6) Data Dependencies


 Handling Data Dependencies:
 Distributed memory architectures:
Required data can be transferred at synchronization points.

 Shared memory architectures:


Operations of reading from/writing to the memory can be
synchronized among tasks.

25
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

(7) Load Balancing


 How to distribute work among all tasks so they are all kept busy all of the
time?

 When barrier synchronization is used, the slowest task determines the


performance.
Serial Load Balancing Using Parallelism
Task Time Task Time

Function 1 Function 1 T1

Function 2 Function 2 T2

Function 3 Function 3 T3

Function 4 Function 4 T4

Total Time Total Time

26
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

(7) Load Balancing


 Ways to achieve Load Balancing
 Equally partitioning the work each task receives.
 For operations that perform similar tasks to all data elements
(ex: array, matrix, loop, …).
 Dynamic work assignment:
 Using a scheduler/task‐pool.
 Developing an algorithm that detects and handles imbalances.

27
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

(8) Granularity
 Computation : Communication ratio

 Fine‐grain Parallelism:
 Small amounts of computation events are done between communication
events.
 Make load balancing easier.
 High communication overhead.
 Small opportunity to enhance performance.
 If its too fine, communication overhead takes much longer than computations.

 Coarse‐grain Parallelism:
 Large amounts of computational events are done between communication
events.
 Large opportunity to enhance performance.
 Harder to do load balancing efficiently .

 Which is better?
28
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

Lecture Outline
 Parallel Computing Design Considerations
 Limits and Costs of Parallel Computing
 Parallel Computing Performance Analysis
 Examples of Problems Solved By Parallel Computing

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Limits and costs of Parallel Computing (1/5)


 Complexity
 Scalability
 Portability
 Resource Requirements

30
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

Limits and costs of Parallel Computing (2/5)


 Complexity

 Parallel programs are much more complex


than corresponding serial ones because they
have several tasks running at the same time
and data flowing between them.

 Costs of complexity are measured in all


aspects of software development: Design,
Coding, Debugging, Tuning, Maintenance.
Source: http://www.jamb.ca

 When designing parallel programs, a programmer should stick


to good software development practices to keep the program
complexity to minimum.

31
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

Limits and costs of Parallel Computing (3/5)


 Scalability
 What makes parallel programs scale?
 Adding more processors?
 Other factors
Source: http://www.gensight.com/
 Software factors
– Most of algorithms have limits to scalability. They reach a point
where adding more resources causes performance to decrease.
– Supporting subsystems and libraries can limit scalability.
 Hardware factors
– Memory‐CPU bus bandwidth on an SMP machine.
– Communications network bandwidth.
– Amount of memory on machine(s).
– Processor clock speed.

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Limits and costs of Parallel Computing (4/5)

 Portability
 Due to standardization, portability issues in
parallel programs are not very serious as they
used to be.

 However… Source: http://www.stockphotopro.com/

 All of the usual portability issues in serial programs apply to parallel ones.
 Even in with standardized APIs, implementation differences that required
code modifications exist.
 Operating systems causes many portability issues.
 Hardware architectures are highly variable which affect portability.

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Limits and costs of Parallel Computing (5/5)


 Resource Requirements
 Parallel computing is used to decrease execution time, but to reach this, more CPU
time is required.
 EX: a parallel code that takes1 hour using 8 CPUs, would take 8 hours of CPU
time when done serially.

 Memory required for parallel code is much more than corresponding serial code.
 Data replication.
 Overheads caused by using support libraries and subsystems.
 Communication overhead
http://www.verseone.com/image/servers.jpg

 Short running parallel programs, performance can


be decreased.
 Overhead caused by setting up parallel
environment, create tasks, communication,
execution time, …
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

Lecture Outline
 Parallel Computing Design Considerations
 Limits and Costs of Parallel Computing
 Parallel Computing Performance Analysis
 Examples of Problems Solved By Parallel Computing

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Parallel Computing Performance Analysis


Many ways are used to measure the performance of parallel computing
programs, such as:

 Amdahl’s Law: helps deciding whether a program merits penalization.

 Gustafson’s Law: a way to evaluate the performance of a parallel


program.

 Karp‐Flatt Metric: helps deciding whether the principle barrier to the


program speedup is the amount of inherently sequential code or
parallel overhead.

 The Isoefficieny Metric: a way to evaluate the scalability of a parallel


algorithm executing on a parallel computer.

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Amdahl’s Law
 Used to find the maximum expected improvement in an
entire application when only one part of it is improved.

 An application could be bounded by one of the following


main factors:
 Computation Time
 Memory Access Time
 Disk Access Time
 Network Access Time

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Amdahl’s Law: Example


Consider an application that spends 70% of its time on computation, 10%
accessing memory, 10% accessing disk, and 10% accessing network.

Type of Memory Disk Network


work Access Computation Access Access

Time 10% 70% 10% 10%

 What is the bounding factor to this application?

 What is the expected improvement percentage in its performance if:

 The memory access speed is doubled?

 The computational speed is doubled?

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Amdahl’s Law: Example


Consider an application that spends 70% of its time on computation, 10%
accessing memory, 10% accessing disk, and 10% accessing network.

Type of Memory Disk Network


work Access Computation Access Access

Time 10% 70% 10% 10%

 What is the bounding factor to this application? Computation

 What is the expected improvement percentage in its performance if:

 The memory access speed is doubled? 5%

 The computational speed is doubled? 35%

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Using Amdahl’s Law in Analyzing Performance of


Parallel Computing

https://computing.llnl.gov/tutorials/parallel_comp/
 Amdahl's Law is used to predict the
maximum improvement in the speedup
when using multiple processors in
parallel.

 Speedup: how much a parallel program


is faster than the corresponding serial
one.

 If P =0 (none of the code is parallelized) speedup = 1 (no speedup).


 If P = 1 (code is parallelized), speedup = ∞ (theoretically).
 If 1/2 of the code is parallelized, speedup = 2 (meaning the code will run twice as
fast).

40
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

Using Amdahl’s Law in Analyzing Performance of


Parallel Computing
 When using multiple processors in

https://computing.llnl.gov/tutorials/parallel_comp/
parallel, program’s speedup is limited
by the time required by the sequential
part of the program.

 P = parallel fraction
N = number of processors
S = serial fraction

 Scalability Limitation:
Problems in which parallel fraction increases as the problem size increases are more
scalable than those with fixed parallel fraction.
41
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

Using Amdahl’s in Parallel Computing: Example


Consider the an application that has 4 different functions: F1: taking 5% of the
running time, F2: 10%, F3: 80%, and F4: 5%.

Function F1 F2 F3 F4

Time 4% 10% 80% 6%

 Parallelizing which part of the application would mostly improve the performance?

 Assume that parts: F1, F3, and 4 can all be parallelized, but F2 must be done serially.
What is the best performance speed up that could be reached by parallelizing those
parts?

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Using Amdahl’s in Parallel Computing: Example


Consider the an application that has 4 different functions: F1: taking 5% of the
running time, F2: 10%, F3: 80%, and F4: 5%.

Function F1 F2 F3 F4

Time 4% 10% 80% 6%

 Parallelizing which part of the application would mostly improve the performance?
F3

 Assume that parts: F1, F3, and 4 can all be parallelized, but F2 must be done serially.
What is the best performance speed up that could be reached by parallelizing those
part? 10 times faster

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Using Amdahl’s in Parallel Computing: Example

Function F1 F2 F3 F4

Time 4% 10% 80% 6%


2 ms 5 ms 40 ms 3 ms

Suppose that running the whole application requires 50 ms. From the last question,
what is the best running time that we can reach?

 Can the application run any faster than this? Why?

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Using Amdahl’s in Parallel Computing: Example

Function F1 F2 F3 F4

Time 4% 10% 80% 6%


2 ms 5 ms 40 ms 3 ms

Suppose that running the whole application requires 50 ms. From the last question,
what is the best running time that we can reach?
Max speedup is 10, best running time is 5 ms

 Can the application run any faster than this? Why?


No, because no matter how the parallel part is fast (fastest is t=0), we can not
decrease the time required to run the serial part (5 seconds).

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Lecture Outline
 Parallel Computing Design Considerations
 Limits and Costs of Parallel Computing
 Parallel Computing Performance Analysis
 Examples of Problems Solved By Parallel Computing

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Parallelization Examples
 Array Processing
 Apply an the same operation or function on each element of the
array,
 Independent, no communication is needed.
 Embarrassingly parallel!
 Divide the array into smaller chunks and distribute them over
processors so the operations can be done in parallel.

 PI Calculation
 Discussed previously in the Demo!

 Can you think of more examples?

47
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

Parallelization Examples
 Array Processing
 Apply an the same operation or function on each element of the
array,
 Independent, no communication is needed.
 Embarrassingly parallel!
 Divide the array into smaller chunks and distribute them over
processors so the operations can be done in parallel.

48
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

Example: Array Processing


 Serial Code:

do j = 1,n
do i = 1,n
a(i,j) = fcn(i,j)
end do
end do

But, this is computationally intensive!

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Example: Array Processing


 Parallel Code:
find out if I am MASTER or WORKER

if I am MASTER
initialize the array
send each WORKER info on part of array it owns
send each WORKER its portion of initial array
receive from each WORKER results

else if I am WORKER Array is divided into chucks,


receive from MASTER info on part of array I own each processor own a chunk,
receive from MASTER my portion of initial array and execute the portion of the
loop corresponding to it.
// calculate my portion of array
do j = my first column,my last column
do i = 1,n a(i,j) = fcn(i,j)
end do
end do
send MASTER results
endif
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

Parallelization Examples: Simple Heat Equation

 The Heat Equation describes the change in


temperature in a given region over time, given
the initial temperature distribution and the
boundary conditions.

 To solve the equation on a 2D region, a finite


differencing scheme is used. 2D array is used
to represent the temperature distribution. So,
the initial array is used to calculate the array
representing the change in the distribution.

 The initial temperature is zero on the boundaries and high in the middle. Boundary
temperature is held at zero.

 Calculating an element depends on neighbor element values. So, communication is


required!
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

Parallelization Examples: Simple Heat Equation

 To calculate one element Ux,y:

 Serial Code:
do iy = 2, ny - 1

do ix = 2, nx - 1

u2(ix, iy) = u1(ix, iy)


+ cx * (u1(ix+1,iy) + u1(ix-1,iy) - 2.*u1(ix,iy))
+ cy * (u1(ix,iy+1) + u1(ix,iy-1) - 2.*u1(ix,iy))
end do

end do
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

Parallelization Examples: Simple Heat Equation


 Parallel Way:

 The array in divided into chunks, each is done


through a task.

 Data dependencies is determined because we


have 2 types of elements:
 interior elements belonging to a task are
independent of other tasks.
 border elements are dependent on the
neighbor elements, so communication is
required.

 Master process sends initial info to worker processes, checks whether convergence
is reached, and collects results.

 Worker process calculates solutions and communicate as neighbor processes when


it’s required.

15-319 Introduction to Cloud Computing Spring 2010 ©


Carnegie Mellon

Parallelization Examples: Simple Heat Equation


 Parallel Code:
find out if I am MASTER or WORKER

if I am MASTER
initialize array
send each WORKER starting info and subarray

do until all WORKERS converge


gather from all WORKERS convergence data
broadcast to all WORKERS convergence signal
end do

receive results from each WORKER

else if I am WORKER
receive from MASTER starting info and subarray

do until solution converged


update time
send neighbors my border info
receive from neighbors their border info
update my portion of solution array
determine if my solution has converged
send MASTER convergence data
receive from MASTER convergence signal
end do

send MASTER results


endif
15-319 Introduction to Cloud Computing Spring 2010 ©
Carnegie Mellon

References
 http://en.wikipedia.org/wiki/Parallel_programming_model
 http://www.buyya.com/cluster/v2chap1.pdf
 https://computing.llnl.gov/tutorials/parallel_comp/
 http://www.acm.org/crossroads/xrds8‐3/programming.html
 http://researchcomp.stanford.edu/hpc/archives/HPCparallel.pdf
 http://www.mcs.anl.gov/~itf/dbpp/text/node9.html
 http://en.wikipedia.org/wiki/Parallel_computing
 http://www.azalisaudi.com/para/Para‐Week2‐TypesOfPara.pdf

55
15-319 Introduction to Cloud Computing Spring 2010 ©

You might also like