Lecture 1
Lecture 1
Andrea Mignone
Dipartimento di Fisica
Turin University, Torino (TO)
Course Requisites
▪ In order to follow these lecture notes and the course material you will need
to have some acquaintance with
• Linux shell
• C / C++ or Fortran compiler
• Basic knowledge of numerical methods
▪ Serial application (codes) can be turned into parallel ones by fulfilling some
requirements which are typically hardware-dependent.
▪ SIMD (Single Instruction, Multiple Data): processes execute the same instruction
(or operation) on a different data elements.
• Example: an application where the same value is added (or subtracted) from a large
number of data points (e.g. multimedia applications).
• Advantage: processing multiple data elements at the same time with a single
instruction can noticeably improve the performance.
• Employed by vector computers.
…
Proc Proc Proc
#0 #1 #N
▪ Processes can follow different control paths during the execution, depending
of the process ID.
MPMD Parallelism
▪ In the Multiple programs, multiple data (MPMD) parallelism, each task can
execute a different program:
…
Proc Proc Proc
#0 #1 #N
Parallel Programming Models
▪ By far, SIMD and SPMD are the most dominant parallel models.
• Shared memory;
• Distributed memory.
▪ Each node has rapid access to its own local memory and access to the
memory of other nodes via some sort of communications network, usually a
proprietary high-speed communications network.
▪ The goal of MPI is to establish a portable, efficient, and flexible standard for
message passing that will be widely used for writing message passing
programs.
▪ MPI is not an IEEE or ISO standard, but has in fact, become the "industry
standard" for writing message passing programs on HPC platforms.
What MPI is NOT
▪ MPI is not a programming language; but instead a realization of a computer
model.
▪ It’s not a new way of parallel programming (rather a realization of the old
message passing paradigm that was around before as POSIX sockets)
▪ It’s not automatically parallelizing code (rather, the programmer gets full
manual control over all communications) ;
Downloading & Installing MPI
Two common implementations of MPI are:
▪ If you have multiple cores, each process will run on a separate core.
▪ If you ask for more processes than the available core CPUs, everything will
run, but with a lower efficiency. MPI creates virtual “processes” in this case.
So, if you have a single CPU single-core machine, you can still use MPI but
(yes, you can run multi-process jobs on a single-cpu single-core machine...)
Writing an MPI Program
MPI Group & Communicators
▪ An MPI group is a fixed, ordered set of unique MPI processes. In other
words, an MPI group is a collection of processes, e.g.
3 MPI_COMM_WORLD
7
0 4
6
5
GROUP_BLUE 1
2
GROUP_GREEN
3
7
2
0
4
1 COMM_BLUE COMM_GREEN 6
5
▪ First the group is created with the desired processes to be included within
the group and then the group is used to create a communicator.
The Default Communicator
▪ The default communicator is called MPI_COMM_WORLD.
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
> mpicc my_program.c –o my_program # Compile the code using MPI C compiler
> mpirun –np 4 ./my_program # run on a 4 processors
I am rank # 0 of 4
I am rank # 1 of 4
I am rank # 3 of 4
I am rank # 2 of 4
Example #2: Multiplication Tables
▪ Suppose we now want different processors to do the
[mult_tables.c]
multiplication table of 1, 2, 3, and 4:
#include <mpi.h>
#include <stdio.h> ▪ Note that each processor will create a
int main(int argc, char ** argv) different table, depending on its rank.
{ So rank #0 will print 1,2,3,… while
int i, rank, size;
rank #1 will do 2, 4, 6, 8, … and so on;
/* -- Initialize MPI environment -- */
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
▪ The output may be chaotic and not
/* -- Create multiplication table -- */
predictable since all processors try to
printf ("[Rank # %d]\n", rank); access the output stream (stdio)
for (i = 1; i <= 10; i++){
printf (" %d\n", i*(rank+1)); concurrently.
}
MPI_Finalize();
return 0;
}
Example #3: solving an Initial Value ODE
▪ [multi_ode.c] We now wish to solve the pendulum 2 nd order ordinary differential
equation
▪ Using 4 processes we can have each task solving the same equation using a
different initial condition.
▪ A 2nd –order Runge Kutta scheme with Δt = 0.1 can be used. The code will be
the same, however:
• Make sure the initial condition is assigned based on the rank
• The output data file should be different for each processes.
Example #3: solving an Initial value ODE
▪ We cast the 2nd –order ODE as a system of
▪ Here Y and R are 2-element arrays containing the unknowns and the right
hand sides for the two 1st –order ODE:
. . .
tn θn ωn
. . .
Visualizing Data
▪ We can now plot the 4 solutions of the ODE using, e.g., gnuplot.