ECE 1747H: Parallel Programming: Message Passing (MPI)
ECE 1747H: Parallel Programming: Message Passing (MPI)
Programming
Message Passing (MPI)
Explicit Parallelism
• Same thing as multithreading for shared
memory.
• Explicit parallelism is more common with
message passing.
– User has explicit control over processes.
– Good: control can be used to performance
benefit.
– Bad: user has to deal with it.
Distributed Memory - Message Passing
network
Distributed Memory - Message Passing
grid 1 temp 1
2 2
3 3
4 4
grid grid
2 3
temp temp
2 3
proc2 proc3
Is this going to work?
Same code as we used for shared memory
grid grid
2 3
temp temp
2 3
proc2 proc3
Is this going to work?
Same code as we used for shared memory
for( i=from; i<to; i++)
for( j=0; j<n; j++ )
temp[i][j] = 0.25*( grid[i-1][j] +
grid[i+1][j]
+ grid[i][j-1] + grid[i]
[j+1]);
grid grid
proc2 proc3
Is this now going to work?
Same code as we used for shared memory
for( i=from; i<to; i++ )
for( j=0; j<n; j++ )
temp[i][j] = 0.25*( grid[i-1][j] +
grid[i+1][j]
+ grid[i][j-1] + grid[i]
[j+1]);
/* Data distribution */
if( myrank != 0 ) {
MPI_Recv( &a[from], n*n/p, MPI_INT, 0, tag,
MPI_COMM_WORLD, &status );
MPI_Recv( &b, n*n, MPI_INT, 0, tag, MPI_COMM_WORLD,
&status );
} else {
for( i=1; i<p; i++ ) {
MPI_Send( &a[from], n*n/p, MPI_INT, i, tag,
MPI_COMM_WORLD );
MPI_Send( &b, n*n, MPI_INT, I, tag, MPI_COMM_WORLD
);
}
}
MPI Matrix Multiply (w/o Index Translation)
/* Computation */
/* Result gathering */
if (myrank!=0)
MPI_Send( &c[from], n*n/p, MPI_INT, 0,
tag, MPI_COMM_WORLD);
else
for (i=1; i<p; i++)
MPI_Recv( &c[from], n*n/p, MPI_INT,
i, tag, MPI_COMM_WORLD,
&status);
Willy
WillyZwaenepoel:
Zwaenepoel:
sing MPI Matrix Multiply (with Index Translation)
ing initialization
initializationofofaaand
andbb
/* Data distribution */
if( myrank != 0 ) {
MPI_Recv( &a, n*n/p, MPI_INT, 0, tag, MPI_COMM_WORLD,
&status );
MPI_Recv( &b, n*n, MPI_INT, 0, tag, MPI_COMM_WORLD,
&status );
} else {
for( i=1; i<p; i++ ) {
MPI_Send( &a[from], n*n/p, MPI_INT, i, tag,
MPI_COMM_WORLD );
MPI_Send( &b, n*n, MPI_INT, I, tag, MPI_COMM_WORLD
);
}
}
MPI Matrix Multiply (with Index Translation)
/* Computation */
/* Result gathering */
if (myrank!=0)
MPI_Send( &c, n*n/p, MPI_INT, 0, tag,
MPI_COMM_WORLD);
else
for( i=1; i<p; i++ )
MPI_Recv( &c[from], n*n/p, MPI_INT,
i, tag, MPI_COMM_WORLD,
&status);
Running a MPI Program
• mpirun <program_name> <arguments>
• Interacts with a daemon process on the
hosts.
• Causes a Unix process to be run on each of
the hosts.
• May only run in interactive mode (batch
mode may be blocked by ssh)
ECE1747 Parallel Programming
root
After Broadcast
inbuf
root
Scatter
MPI_Scatter(inbuf, incnt, intype, outbuf,
outcnt, outtype, root, comm)
inbuf: address of input buffer
incnt: number of input elements
intype: type of input elements
outbuf: address of output buffer
outcnt: number of output elements
outtype: type of output elements
root: process id of root process
Before Scatter
inbuf
outbuf
root
Gather
MPI_Gather(inbuf, incnt, intype, outbuf,
outcnt, outtype, root, comm)
inbuf: address of input buffer
incnt: number of input elements
intype: type of input elements
outbuf: address of output buffer
outcnt: number of output elements
outtype: type of output elements
root: process id of root process
Before Gather
inbuf
outbuf
root
Broadcast/Scatter/Gather
• Funny thing: these three primitives are
sends and receives at the same time (a little
confusing sometimes).
• Perhaps un-intended consequence: requires
global agreement on layout of array.
Willy
WillyZwaenepoel:
Zwaenepoel:
sing MPI Matrix Multiply (w/o Index Translation)
ing initialization
initializationofofaaand
andbb
/* Data distribution */
if( myrank != 0 ) {
MPI_Recv( &a[from[myrank]], n*n/p, MPI_INT, 0, tag,
MPI_COMM_WORLD, &status );
MPI_Recv( &b, n*n, MPI_INT, 0, tag, MPI_COMM_WORLD,
&status );
} else {
for( i=1; i<p; i++ ) {
MPI_Send( &a[from[i]], n*n/p, MPI_INT, i, tag,
MPI_COMM_WORLD );
MPI_Send( &b, n*n, MPI_INT, I, tag, MPI_COMM_WORLD
);
}
}
MPI Matrix Multiply (w/o Index Translation)
/* Computation */
/* Result gathering */
if (myrank!=0)
MPI_Send( &c[from[myrank]], n*n/p, MPI_INT,
0, tag, MPI_COMM_WORLD);
else
for( i=1; i<p; i++ )
MPI_Recv( &c[from[i]], n*n/p, MPI_INT,
i, tag, MPI_COMM_WORLD,
&status);
Willy
WillyZwaenepoel:
Zwaenepoel:
sing MPI Matrix Multiply Revised (1 of 2)
ing initialization
initializationofofaaand
andbb
grid grid
proc2 proc3
MPI SOR
for some number of timesteps/iterations {
for (i=from; i<to; i++ )
for( j=0; j<n, j++ )
temp[i][j] = 0.25 *
( grid[i-1][j] + grid[i+1][j]
grid[i][j-1] + grid[i][j+1] );
for( i=from; i<to; i++ )
for( j=0; j<n; j++ )
grid[i][j] = temp[i][j];
/* here comes communication */
}
MPI SOR Communication
if (myrank != 0) {
MPI_Send (grid[from], n, MPI_DOUBLE,
myrank-1, tag, MPI_COMM_WORLD);
MPI_Recv (grid[from-1], n, MPI_DOUBLE,
myrank-1, tag, MPI_COMM_WORLD,
&status);
}
if (myrank != p-1) {
MPI_Send (grid[to-1], n, MPI_DOUBLE,
myrank+1, tag, MPI_COMM_WORLD);
MPI_Recv (grid[to], n, MPI_DOUBLE,
myrank+1, tag, MPI_COMM_WORLD, &status);
}
No Barrier Between Loop Nests?
• Not necessary.
• Anti-dependences do not need to be
covered in message passing.
• Memory is private, so overwrite does not
matter.
SOR: Terminating Condition
• Real versions of SOR do not run for some
fixed number of iterations.
• Instead, they test for convergence.
• Possible convergence criterion: difference
between two successive iterations is less
than some delta.
SOR Sequential Code with Convergence