0% found this document useful (0 votes)

9 views67 pages

ECE 1747H: Parallel Programming: Message Passing (MPI)

Uploaded by

mofreh hogo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views67 pages

ECE 1747H: Parallel Programming: Message Passing (MPI)

Uploaded by

mofreh hogo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 67

ECE 1747H : Parallel

Programming
Message Passing (MPI)
Explicit Parallelism
• Same thing as multithreading for shared
memory.
• Explicit parallelism is more common with
message passing.
– User has explicit control over processes.
– Good: control can be used to performance
benefit.
– Bad: user has to deal with it.
Distributed Memory - Message Passing

mem1 mem2 mem3 memN

proc1 proc2 proc3 procN

network
Distributed Memory - Message Passing

• A variable x, a pointer p, or an array a[]

refer to different memory locations,
depending of the processor.
• In this course, we discuss message passing
as a programming model (can be on any
hardware)
What does the user have to do?
• This is what we said for shared memory:
– Decide how to decompose the computation into
parallel parts.
– Create (and destroy) processes to support that
decomposition.
– Add synchronization to make sure dependences
are covered.
• Is the same true for message passing?
Another Look at SOR Example
for some number of timesteps/iterations {
for (i=0; i<n; i++ )
for( j=0; j<n, j++ )
temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1]
[j] +
grid[i][j-1] + grid[i]
[j+1] );
for( i=0; i<n; i++ )
for( j=0; j<n; j++ )
grid[i][j] = temp[i][j];
}
Shared Memory

grid 1 temp 1
2 2
3 3
4 4

proc1 proc2 proc3 procN

Message-Passing Data Distribution (only
middle processes)

grid grid
2 3
temp temp
2 3

proc2 proc3
Is this going to work?
Same code as we used for shared memory

for( i=from; i<to; i++ )

for( j=0; j<n; j++ )
temp[i][j] = 0.25*( grid[i-1][j] + grid[i+1]
[j]
+ grid[i][j-1] + grid[i]
[j+1]);

No, we need extra boundary elements for grid.

Data Distribution (only middle processes)

grid grid
2 3
temp temp
2 3

proc2 proc3
Is this going to work?
Same code as we used for shared memory
for( i=from; i<to; i++)
for( j=0; j<n; j++ )
temp[i][j] = 0.25*( grid[i-1][j] +
grid[i+1][j]
+ grid[i][j-1] + grid[i]
[j+1]);

No, on the next iteration we need boundary

elements from our neighbors.
Data Communication (only middle processes)

grid grid

proc2 proc3
Is this now going to work?
Same code as we used for shared memory
for( i=from; i<to; i++ )
for( j=0; j<n; j++ )
temp[i][j] = 0.25*( grid[i-1][j] +
grid[i+1][j]
+ grid[i][j-1] + grid[i]
[j+1]);

No, we need to translate the indices.

Index Translation

for( i=0; i<n/p; i++)

for( j=0; j<n; j++ )
temp[i][j] = 0.25*( grid[i-1][j] +
grid[i+1][j]
+ grid[i][j-1] + grid[i]
[j+1]);

Remember, all variables are local.

Index Translation is Optional
• Allocate the full arrays on each processor.
• Leave indices alone.
• Higher memory use.
• Sometimes necessary (see later).
What does the user need to do?
• Divide up program in parallel parts.
• Create and destroy processes to do above.
• Partition and distribute the data.
• Communicate data at the right time.
• (Sometimes) perform index translation.
• Still need to do synchronization?
– Sometimes, but many times goes hand in hand
with data communication.
Message Passing Systems
• Provide process creation and destruction.
• Provide message passing facilities (send
and receive, in various flavors) to distribute
and communicate data.
• Provide additional synchronization
facilities.
MPI (Message Passing Interface)
• Is the de facto message passing standard.
• Available on virtually all platforms.
• Grew out of an earlier message passing
system, PVM, now outdated.
MPI Process Creation/Destruction

MPI_Init( int argc, char **argv )

Initiates a computation.
MPI_Finalize()
Terminates a computation.
MPI Process Identification
MPI_Comm_size( comm, &size )
Determines the number of processes.
MPI_Comm_rank( comm, &pid )
Pid is the process identifier of
the caller.
MPI Basic Send
MPI_Send(buf, count, datatype, dest, tag, comm)
buf: address of send buffer
count: number of elements
datatype: data type of send buffer elements
dest: process id of destination process
tag: message tag (ignore for now)
comm: communicator (ignore for now)
MPI Basic Receive
MPI_Recv(buf, count, datatype, source, tag, comm, &status)

buf: address of receive buffer

count: size of receive buffer in elements
datatype: data type of receive buffer elements
source: source process id or MPI_ANY_SOURCE
tag and comm: ignore for now
status: status object
Willy
WillyZwaenepoel:
Zwaenepoel:
sing MPI Matrix Multiply (w/o Index Translation)
ing initialization
initializationofofaaand
andbb

main(int argc, char *argv[])

{
MPI_Init (&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &p);
from = (myrank * n)/p;
to = ((myrank+1) * n)/p;
/* Data distribution */ ...
/* Computation */ ...
/* Result gathering */ ...
MPI_Finalize();
}
MPI Matrix Multiply (w/o Index Translation)

/* Data distribution */
if( myrank != 0 ) {
MPI_Recv( &a[from], n*n/p, MPI_INT, 0, tag,
MPI_COMM_WORLD, &status );
MPI_Recv( &b, n*n, MPI_INT, 0, tag, MPI_COMM_WORLD,
&status );
} else {
for( i=1; i<p; i++ ) {
MPI_Send( &a[from], n*n/p, MPI_INT, i, tag,
MPI_COMM_WORLD );
MPI_Send( &b, n*n, MPI_INT, I, tag, MPI_COMM_WORLD
);
}
}
MPI Matrix Multiply (w/o Index Translation)

/* Computation */

for ( i=from; i<to; i++)

for (j=0; j<n; j++) {
C[i][j]=0;
for (k=0; k<n; k++)
C[i][j] += A[i][k]*B[k][j];
}
MPI Matrix Multiply (w/o Index Translation)

/* Result gathering */
if (myrank!=0)
MPI_Send( &c[from], n*n/p, MPI_INT, 0,
tag, MPI_COMM_WORLD);
else
for (i=1; i<p; i++)
MPI_Recv( &c[from], n*n/p, MPI_INT,
i, tag, MPI_COMM_WORLD,
&status);
Willy
WillyZwaenepoel:
Zwaenepoel:
sing MPI Matrix Multiply (with Index Translation)
ing initialization
initializationofofaaand
andbb

main(int argc, char *argv[])

/* Data distribution */
if( myrank != 0 ) {
MPI_Recv( &a, n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD,
&status );
MPI_Recv( &b, n*n, MPI_INT, 0, tag, MPI_COMM_WORLD,
&status );
} else {
for( i=1; i<p; i++ ) {
MPI_Send( &a[from], n*n/p, MPI_INT, i, tag,
MPI_COMM_WORLD );
MPI_Send( &b, n*n, MPI_INT, I, tag, MPI_COMM_WORLD
);
}
}
MPI Matrix Multiply (with Index Translation)

/* Computation */

for ( i=0; i<n/p; i++)

for (j=0; j<n; j++) {
C[i][j]=0;
for (k=0; k<n; k++)
C[i][j] += A[i][k]*B[k][j];
}
MPI Matrix Multiply (with Index Translation)

/* Result gathering */
if (myrank!=0)
MPI_Send( &c, n*n/p, MPI_INT, 0, tag,
MPI_COMM_WORLD);
else
for( i=1; i<p; i++ )
MPI_Recv( &c[from], n*n/p, MPI_INT,
i, tag, MPI_COMM_WORLD,
&status);
Running a MPI Program
• mpirun <program_name> <arguments>
• Interacts with a daemon process on the
hosts.
• Causes a Unix process to be run on each of
the hosts.
• May only run in interactive mode (batch
mode may be blocked by ssh)
ECE1747 Parallel Programming

Message Passing (MPI)

Global Operations
What does the user need to do?
• Divide program in parallel parts.
• Create and destroy processes to do above.
• Partition and distribute the data.
• Communicate data at the right time.
• (Sometimes) perform index translation.
• Still need to do synchronization?
– Sometimes, but many times goes hand in hand
with data communication.
MPI Process Creation/Destruction

MPI_Init( int *argc, char **argv )

Initiates a computation.
MPI_Finalize()
Finalizes a computation.
MPI Process Identification
MPI_Comm_size( comm, &size )
Determines the number of processes.
MPI_Comm_rank( comm, &pid )
Pid is the process identifier of
the caller.
MPI Basic Send
MPI_Send(buf, count, datatype, dest, tag, comm)
buf: address of send buffer
count: number of elements
datatype: data type of send buffer elements
dest: process id of destination process
tag: message tag (ignore for now)
comm: communicator (ignore for now)
MPI Basic Receive
MPI_Recv(buf, count, datatype, source, tag, comm, &status)

buf: address of receive buffer

count: size of receive buffer in elements
datatype: data type of receive buffer elements
source: source process id or MPI_ANY_SOURCE
tag and comm: ignore for now
status: status object
Global Operations (1 of 2)
• So far, we have only looked at point-to-
point or one-to-one message passing
facilities.
• Often, it is useful to have one-to-many or
many-to-one message communication.
• This is what MPI’s global operations do.
Global Operations (2 of 2)
• MPI_Barrier
• MPI_Bcast
• MPI_Gather
• MPI_Scatter
• MPI_Reduce
• MPI_Allreduce
Barrier
MPI_Barrier(comm)

Global barrier synchronization, as before: all

processes wait until all have arrived.
Broadcast
MPI_Bcast(inbuf, incnt, intype, root, comm)

inbuf: address of input buffer (on root);

address of output buffer (elsewhere)
incnt: number of elements
intype: type of elements
root: process id of root process
Before Broadcast
inbuf

proc0 proc1 proc2 proc3

root
After Broadcast
inbuf

proc0 proc1 proc2 proc3

root
Scatter
MPI_Scatter(inbuf, incnt, intype, outbuf,
outcnt, outtype, root, comm)
inbuf: address of input buffer
incnt: number of input elements
intype: type of input elements
outbuf: address of output buffer
outcnt: number of output elements
outtype: type of output elements
root: process id of root process
Before Scatter
inbuf
outbuf

proc0 proc1 proc2 proc3

root
After Scatter
inbuf
outbuf

proc0 proc1 proc2 proc3

root
Gather
MPI_Gather(inbuf, incnt, intype, outbuf,
outcnt, outtype, root, comm)
inbuf: address of input buffer
incnt: number of input elements
intype: type of input elements
outbuf: address of output buffer
outcnt: number of output elements
outtype: type of output elements
root: process id of root process
Before Gather
inbuf
outbuf

proc0 proc1 proc2 proc3

root
After Gather
inbuf
outbuf

proc0 proc1 proc2 proc3

root
Broadcast/Scatter/Gather
• Funny thing: these three primitives are
sends and receives at the same time (a little
confusing sometimes).
• Perhaps un-intended consequence: requires
global agreement on layout of array.
Willy
WillyZwaenepoel:
Zwaenepoel:
sing MPI Matrix Multiply (w/o Index Translation)
ing initialization
initializationofofaaand
andbb

main(int argc, char *argv[])

{
MPI_Init (&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &p);
for( i=0; i<p; i++ ) {
from[i] = (i * n)/p;
to[i] = ((i+1) * n)/p;
}
/* Data distribution */ ...
/* Computation */ ...
/* Result gathering */ ...
MPI_Finalize();
}
MPI Matrix Multiply (w/o Index Translation)

/* Data distribution */
if( myrank != 0 ) {
MPI_Recv( &a[from[myrank]], n*n/p, MPI_INT, 0, tag,
MPI_COMM_WORLD, &status );
MPI_Recv( &b, n*n, MPI_INT, 0, tag, MPI_COMM_WORLD,
&status );
} else {
for( i=1; i<p; i++ ) {
MPI_Send( &a[from[i]], n*n/p, MPI_INT, i, tag,
MPI_COMM_WORLD );
MPI_Send( &b, n*n, MPI_INT, I, tag, MPI_COMM_WORLD
);
}
}
MPI Matrix Multiply (w/o Index Translation)

/* Computation */

for ( i=from[myrank]; i<to[myrank]; i++)

for (j=0; j<n; j++) {
C[i][j]=0;
for (k=0; k<n; k++)
C[i][j] += A[i][k]*B[k][j];
}
MPI Matrix Multiply (w/o Index Translation)

/* Result gathering */
if (myrank!=0)
MPI_Send( &c[from[myrank]], n*n/p, MPI_INT,
0, tag, MPI_COMM_WORLD);
else
for( i=1; i<p; i++ )
MPI_Recv( &c[from[i]], n*n/p, MPI_INT,
i, tag, MPI_COMM_WORLD,
&status);
Willy
WillyZwaenepoel:
Zwaenepoel:
sing MPI Matrix Multiply Revised (1 of 2)
ing initialization
initializationofofaaand
andbb

main(int argc, char *argv[])

{
MPI_Init (&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &p);
from = (myrank * n)/p;
to = ((myrank+1) * n)/p;
MPI_Scatter (a, n*n/p, MPI_INT, a, n*n/p, MPI_INT, 0,
MPI_COMM_WORLD);
MPI_Bcast (b,n*n, MPI_INT, 0, MPI_COMM_WORLD);
...
MPI Matrix Multiply Revised (2 of 2)
...
for (i=from; i<to; i++)
for (j=0; j<n; j++) {
C[i][j]=0;
for (k=0; k<n; k++)
C[i][j] += A[i][k]*B[k][j];
}
MPI_Gather (C[from], n*n/p, MPI_INT, c[from], n*n/p,
MPI_INT, 0, MPI_COMM_WORLD);
MPI_Finalize();
}
SOR Sequential Code
for some number of timesteps/iterations {
for (i=0; i<n; i++ )
for( j=0; j<n, j++ )
temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1]
[j]
grid[i][j-1] + grid[i]
[j+1] );
for( i=0; i<n; i++ )
for( j=0; j<n; j++ )
grid[i][j] = temp[i][j];
}
MPI SOR
• Allocate grid and temp arrays.
• Use MPI_Scatter to distribute initial values,
if any (requires non-local allocation).
• Use MPI_Gather to return the results to
process 0 (requires non-local allocation).
• Focusing only on communication within the
computational part ...
Data Communication (only middle processes)

grid grid

proc2 proc3
MPI SOR
for some number of timesteps/iterations {
for (i=from; i<to; i++ )
for( j=0; j<n, j++ )
temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1][j]
grid[i][j-1] + grid[i][j+1] );
for( i=from; i<to; i++ )
for( j=0; j<n; j++ )
grid[i][j] = temp[i][j];
/* here comes communication */
}
MPI SOR Communication
if (myrank != 0) {
MPI_Send (grid[from], n, MPI_DOUBLE,
myrank-1, tag, MPI_COMM_WORLD);
MPI_Recv (grid[from-1], n, MPI_DOUBLE,
myrank-1, tag, MPI_COMM_WORLD,
&status);
}
if (myrank != p-1) {
MPI_Send (grid[to-1], n, MPI_DOUBLE,
myrank+1, tag, MPI_COMM_WORLD);
MPI_Recv (grid[to], n, MPI_DOUBLE,
myrank+1, tag, MPI_COMM_WORLD, &status);
}
No Barrier Between Loop Nests?
• Not necessary.
• Anti-dependences do not need to be
covered in message passing.
• Memory is private, so overwrite does not
matter.
SOR: Terminating Condition
• Real versions of SOR do not run for some
fixed number of iterations.
• Instead, they test for convergence.
• Possible convergence criterion: difference
between two successive iterations is less
than some delta.
SOR Sequential Code with Convergence

for( ; diff > delta; ) {

for (i=0; i<n; i++ )
for( j=0; j<n, j++ ) { … }
diff = 0;
for( i=0; i<n; i++ )
for( j=0; j<n; j++ ) {
diff = max(diff, fabs(grid[i][j] -
temp[i][j]));
grid[i][j] = temp[i][j];
}
}
Reduction
MPI_Reduce(inbuf, outbuf, count, type, op, root, comm)

inbuf: address of input buffer

outbuf: address of output buffer
count: number of elements in input buffer
type: datatype of input buffer elements
op: operation (MPI_MIN, MPI_MAX, etc.)
root: process id of root process
Global Reduction
MPI_Allreduce(inbuf, outbuf, count, type, op, comm)

inbuf: address of input buffer

outbuf: address of output buffer
count: number of elements in input buffer
type: datatype of input buffer elements
op: operation (MPI_MIN, MPI_MAX, etc.)
no root process
MPI SOR Code with Convergence
for( ; diff > delta; ) {
for (i=from; i<to; i++ )
for( j=0; j<n, j++ ) { … }
mydiff = 0.0;
for( i=from; i<to; i++ )
for( j=0; j<n; j++ ) {
mydiff=max(mydiff,fabs(grid[i][j]-temp[i]
[j]);
grid[i][j] = temp[i][j];
}
MPI_Allreduce (&mydiff, &diff, 1, MPI_DOUBLE,
MPI_MAX, MPI_COMM_WORLD);
...
}

MiniTool Partition Wizard Crack 12 Key Download Free 2025
No ratings yet
MiniTool Partition Wizard Crack 12 Key Download Free 2025
29 pages
Parallel Programming 3
No ratings yet
Parallel Programming 3
22 pages
Distributed Memory Programming With MPI: Peter Pacheco
No ratings yet
Distributed Memory Programming With MPI: Peter Pacheco
121 pages
VSS Mpi 2
No ratings yet
VSS Mpi 2
23 pages
CUDA Installation Guide Linux
No ratings yet
CUDA Installation Guide Linux
45 pages
ATPESC 2019 Track-2 1-7-30 830am Guo-Raffenetti-Thakur-MPI For Scalable Computing
No ratings yet
ATPESC 2019 Track-2 1-7-30 830am Guo-Raffenetti-Thakur-MPI For Scalable Computing
199 pages
Cs8083 Notes Mcap
No ratings yet
Cs8083 Notes Mcap
187 pages
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
No ratings yet
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
20 pages
Lecture 13-Derived Datatypes in MPI
No ratings yet
Lecture 13-Derived Datatypes in MPI
33 pages
03 MPIProgramStructure
No ratings yet
03 MPIProgramStructure
42 pages
2 Mpi
No ratings yet
2 Mpi
13 pages
Cloud Computing Full Notes
No ratings yet
Cloud Computing Full Notes
92 pages
BIg Data Anslysi
No ratings yet
BIg Data Anslysi
57 pages
Message Passing-1
No ratings yet
Message Passing-1
76 pages
5CS022 Lecture 2
No ratings yet
5CS022 Lecture 2
24 pages
08 1 MPI Comm Data Distributions
No ratings yet
08 1 MPI Comm Data Distributions
60 pages
Pporders
No ratings yet
Pporders
59 pages
Key Concepts in MPI Programming: Processes
No ratings yet
Key Concepts in MPI Programming: Processes
6 pages
MPI Pacheco Ch3
No ratings yet
MPI Pacheco Ch3
124 pages
Distributed Systems and Cloud Computing
No ratings yet
Distributed Systems and Cloud Computing
24 pages
MPI Cheat Sheet
No ratings yet
MPI Cheat Sheet
1 page
Computer Archi
No ratings yet
Computer Archi
58 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
PDC Notes. 2.
No ratings yet
PDC Notes. 2.
17 pages
10.collectives I
No ratings yet
10.collectives I
31 pages
Intro MPI
No ratings yet
Intro MPI
60 pages
CS-3006 - 5 - MPI Basics
No ratings yet
CS-3006 - 5 - MPI Basics
53 pages
Mpi 1
No ratings yet
Mpi 1
20 pages
High Performance Computing: Matthew Jacob Indian Institute of Science
No ratings yet
High Performance Computing: Matthew Jacob Indian Institute of Science
25 pages
COA Unit 5 Notes
No ratings yet
COA Unit 5 Notes
28 pages
Writing Message Passing Parallel Programs With MPI: Course Notes
No ratings yet
Writing Message Passing Parallel Programs With MPI: Course Notes
80 pages
02 Collections
No ratings yet
02 Collections
27 pages
Trees
No ratings yet
Trees
26 pages
Cs-3006 6 Mpi Basics 2
No ratings yet
Cs-3006 6 Mpi Basics 2
52 pages
Chapter 4 - Message-Passing Programming, MPI
No ratings yet
Chapter 4 - Message-Passing Programming, MPI
79 pages
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
No ratings yet
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
91 pages
Unit Iv Distributed Memory Programming With Mpi
No ratings yet
Unit Iv Distributed Memory Programming With Mpi
19 pages
The Earth Simulator - Shilpashree.g.1sg06cs066
No ratings yet
The Earth Simulator - Shilpashree.g.1sg06cs066
20 pages
CAO - Unit 1 QB With Ans
No ratings yet
CAO - Unit 1 QB With Ans
21 pages
In3200 Chap09
No ratings yet
In3200 Chap09
56 pages
MPI Part2 Updated
No ratings yet
MPI Part2 Updated
20 pages
04 Algorithm Analysis
No ratings yet
04 Algorithm Analysis
33 pages
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center
No ratings yet
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center
19 pages
Introduction To C MPI PM
No ratings yet
Introduction To C MPI PM
50 pages
Week 10
No ratings yet
Week 10
52 pages
Unit IV
No ratings yet
Unit IV
12 pages
Message Passing Interface (MPI) : EC3500: Introduction To Parallel Computing
100% (1)
Message Passing Interface (MPI) : EC3500: Introduction To Parallel Computing
40 pages
Computer Organization and Architecture Tutorial
No ratings yet
Computer Organization and Architecture Tutorial
7 pages
Distributed Memory Programming With: Peter Pacheco
No ratings yet
Distributed Memory Programming With: Peter Pacheco
125 pages
Parallel Quicksort Implementation Using Mpi and Pthreads: Puneet Kataria RUID - 117004233
No ratings yet
Parallel Quicksort Implementation Using Mpi and Pthreads: Puneet Kataria RUID - 117004233
14 pages
Understanding Inherent Parallelism
No ratings yet
Understanding Inherent Parallelism
12 pages
Lec 6
No ratings yet
Lec 6
21 pages
Message Passing Interface: Parallel Processing Course University of Tehran
No ratings yet
Message Passing Interface: Parallel Processing Course University of Tehran
49 pages
Lecture 15 MPI Summarization
No ratings yet
Lecture 15 MPI Summarization
26 pages
Parallel & Distributed Computing: MPI - Message Passing Interface
No ratings yet
Parallel & Distributed Computing: MPI - Message Passing Interface
49 pages
8-DL Beckner Kaza
No ratings yet
8-DL Beckner Kaza
42 pages
Message Passing and MPI: John Mellor-Crummey
No ratings yet
Message Passing and MPI: John Mellor-Crummey
78 pages
Lecture01 Introduction
No ratings yet
Lecture01 Introduction
42 pages
Tpcds Workload Analysis PDF
No ratings yet
Tpcds Workload Analysis PDF
12 pages
1 MPI Communications: CS424. Parallel Computing Lab#4
No ratings yet
1 MPI Communications: CS424. Parallel Computing Lab#4
30 pages
The 8051 Microcontroller: Introduction To Microcontrollers
No ratings yet
The 8051 Microcontroller: Introduction To Microcontrollers
8 pages
Best Reference Books - Computer Science and Engineering - Sanfoundry
No ratings yet
Best Reference Books - Computer Science and Engineering - Sanfoundry
42 pages
ECE 1747H: Parallel Programming: Message Passing (MPI)
No ratings yet
ECE 1747H: Parallel Programming: Message Passing (MPI)
67 pages
Dilithium FPGA Protocol
No ratings yet
Dilithium FPGA Protocol
9 pages
Class03 - MPI, Part 1, Intermediate PDF
No ratings yet
Class03 - MPI, Part 1, Intermediate PDF
83 pages
Risc VS Cisc
No ratings yet
Risc VS Cisc
2 pages
Message Passing Interface (MPI) : Steve Lantz Center For Advanced Computing Cornell University
No ratings yet
Message Passing Interface (MPI) : Steve Lantz Center For Advanced Computing Cornell University
53 pages
Lenovo + DataCore Guide
No ratings yet
Lenovo + DataCore Guide
71 pages
Lecture 11 Distributed Memory Programming
No ratings yet
Lecture 11 Distributed Memory Programming
28 pages
Super Scalar Architecture With Dynamic Branch Prediction
No ratings yet
Super Scalar Architecture With Dynamic Branch Prediction
5 pages
Introduction MPI - Chap2 - Slide 3
No ratings yet
Introduction MPI - Chap2 - Slide 3
16 pages
Lec5 MPI
No ratings yet
Lec5 MPI
28 pages
Ass Parallel
No ratings yet
Ass Parallel
11 pages
The Message Passing Interface (MPI)
No ratings yet
The Message Passing Interface (MPI)
18 pages
02 Mpi 0
No ratings yet
02 Mpi 0
19 pages
EC6009 Advanced Computer Architecture University Question Paper Nov Dec 2017
No ratings yet
EC6009 Advanced Computer Architecture University Question Paper Nov Dec 2017
3 pages
Chapter 3 Projects
No ratings yet
Chapter 3 Projects
13 pages
HPC Int2 Key
No ratings yet
HPC Int2 Key
10 pages
Multiple Processor Scheduling
No ratings yet
Multiple Processor Scheduling
4 pages
Before Memory Was Virtual - Peter J. Denning
No ratings yet
Before Memory Was Virtual - Peter J. Denning
18 pages
Introduction To MPI Basics
No ratings yet
Introduction To MPI Basics
8 pages
Message Passing Interface (MPI) Programming
No ratings yet
Message Passing Interface (MPI) Programming
11 pages
Mpi
No ratings yet
Mpi
30 pages
Knowledge Discovery in Data Science: KDD Meets Big Data
No ratings yet
Knowledge Discovery in Data Science: KDD Meets Big Data
6 pages
Distributed OS Sem V
No ratings yet
Distributed OS Sem V
5 pages
MPI2
No ratings yet
MPI2
3 pages
Mpi Basic Operations
No ratings yet
Mpi Basic Operations
6 pages
Message Passing Interface (MPI)
No ratings yet
Message Passing Interface (MPI)
4 pages
An Introduction To MPI: Parallel Programming With The Message Passing Interface
No ratings yet
An Introduction To MPI: Parallel Programming With The Message Passing Interface
48 pages
HPC Lecture40
No ratings yet
HPC Lecture40
25 pages
ECE690 Syllabus
No ratings yet
ECE690 Syllabus
2 pages

ECE 1747H: Parallel Programming: Message Passing (MPI)

Uploaded by

ECE 1747H: Parallel Programming: Message Passing (MPI)

Uploaded by

ECE 1747H : Parallel

mem1 mem2 mem3 memN

proc1 proc2 proc3 procN

• A variable x, a pointer p, or an array a[]

proc1 proc2 proc3 procN

for( i=from; i<to; i++ )

No, we need extra boundary elements for grid.

No, on the next iteration we need boundary

No, we need to translate the indices.

for( i=0; i<n/p; i++)

Remember, all variables are local.

MPI_Init( int argc, char **argv )

buf: address of receive buffer

main(int argc, char *argv[])

for ( i=from; i<to; i++)

main(int argc, char *argv[])

for ( i=0; i<n/p; i++)

Message Passing (MPI)

MPI_Init( int *argc, char **argv )

buf: address of receive buffer

Global barrier synchronization, as before: all

inbuf: address of input buffer (on root);

proc0 proc1 proc2 proc3

proc0 proc1 proc2 proc3

proc0 proc1 proc2 proc3

proc0 proc1 proc2 proc3

proc0 proc1 proc2 proc3

proc0 proc1 proc2 proc3

main(int argc, char *argv[])

for ( i=from[myrank]; i<to[myrank]; i++)

main(int argc, char *argv[])

for( ; diff > delta; ) {

inbuf: address of input buffer

inbuf: address of input buffer

You might also like