Class03 - MPI, Part 1, Intermediate PDF
Class03 - MPI, Part 1, Intermediate PDF
M. D. Jones, Ph.D.
Center for Computational Research University at Buffalo State University of New York
Intermediate MPI
1 / 89
Why MPI?
A parallel calculation in which each process (out of a specied number of processes) works on a local copy of the data, with local variables. Namely, no process is allowed to directly access the memory (available data) of another process. The mechanism by which individual processes share information (data) is through explicit sending (and receiving) of data between the processes. General assumption - a one-to-one mapping of processes to processors (although this is not necessarily always the case).
Intermediate MPI
3 / 89
Why MPI?
Upside of MPI
Advantages: Very general model (message passing) Applicable to widest variety of hardware platforms (SMPs, NOWs, etc.). Allows great control over data location and ow in a program. Programs can usually achieve higher performance level (scalability).
Intermediate MPI
4 / 89
Why MPI?
Downside of MPI
Disadvantages: Programmer has to work hard(er) to implement. Best performance gains can involve re-engineering the code. The MPI standard does not specify mechanism for launching parallel tasks (task launcher). Implementation dependent - it can be a bit of a pain.
Intermediate MPI
5 / 89
Why MPI?
MPI-2
MPI-3
Intermediate MPI
6 / 89
Why MPI?
MPI-1
Point-to-point Communications Collective Operations Process Groups Communication Domains Process Topologies Environmental Management & Inquiry Proling Interface FORTRAN and C Bindings
Intermediate MPI
7 / 89
Why MPI?
MPI-2
Dynamic Process Management (pretty available) Input/Output (supporting hardware is hardest to nd) One-sided Operations (hardest to nd, but generally available) C++ Bindings (generally available, but deprecated!)
Intermediate MPI
8 / 89
Why MPI?
MPI References
Using MPI: Portable Programming With the Message Passing Interface, second edition, W. Gropp, E. Lusk, and A. Skellum (MIT Press, Cambridge, 1999). MPIThe Complete Reference, Vol. 1, The MPI Core, M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra (MIT Press, Cambridge, 1998). MPIThe Complete Reference, Vol. 2, The MPI Extensions, W. Gropp, S. Huss-Lederman, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir, M. Snir, and J. Dongarra (MIT Press, Cambridge, 1998).
Intermediate MPI
9 / 89
Why MPI?
rst edition of the title MPI The Complete Reference, also available as a PostScript le. A useful online reference to all of the routines and their bindings: http://www-unix.mcs.anl.gov/mpi/www/www3 Note that this is for MPICH 1.2, but its quite handy.
Intermediate MPI
10 / 89
Introduction
MPI is Large
MPI 1.2 has 128 functions. MPI 2.0 has 152 functions.
MPI is Small
Many programs need to use only about 6 MPI functions.
Intermediate MPI
12 / 89
Introduction
Intermediate MPI
13 / 89
Introduction
When you need a portable parallel API When you are writing a parallel library When you have data processing that is not conducive to a data parallel approach When you care about parallel performance
Intermediate MPI
14 / 89
Introduction
When you are can just use a parallel library (which may itself be written in MPI). When you need only simple threading on data-parallel tasks. When you dont need large (many processor) parallel speedup.
Intermediate MPI
15 / 89
Introduction
MPI Fundamentals
Message passing codes run the same (usually serial) code on multiple processors, which communicate with one another via library calls which fall into a few general categories: Calls to initialize, manage, and terminate communications Calls to communicate between two individual processors (point-to-point) Calls to communicate among a group of processors (collective) Calls to create custom datatypes I will briey cover the rst three, and present a few concrete examples.
Intermediate MPI
16 / 89
Introduction
MPI Fundamentals
Intermediate MPI
17 / 89
Introduction
MPI Fundamentals
In FORTRAN 77
1 2 3 program main i m p l i c i t none include mpif . h
Fortran 90/95
1 2 3 program main i m p l i c i t none use MPI
Intermediate MPI
18 / 89
Introduction
MPI Fundamentals
Intermediate MPI
19 / 89
Introduction
MPI Fundamentals
Generally the MPI routines return an error code, using the exit status in C, which can be tested with a predened success value:
1 2 3 4 5 6 7 int i e r r ; ... i e r r = MPI_INIT (& argc ,& argv ) ; i f ( i e r r ! = MPI_SUCCESS) { . . . e x i t w i t h an e r r o r . . . } ...
Intermediate MPI
20 / 89
Introduction
MPI Fundamentals
and in FORTRAN the error code is passed back as the last argument in the MPI subroutine call:
1 2 3 4 integer : : ierr
Intermediate MPI
21 / 89
Introduction
MPI Fundamentals
MPI Handles
MPI denes its own data structures, which can be referenced by the use through the use of handles. handles can be returned by MPI routines, and used as arguments to other MPI routines. Some examples: MPI_SUCCESS - Used to test MPI error codes. An integer in both C and FORTRAN. MPI_COMM_WORLD - A (pre-dened) communicator consisting of all of the processes. An integer FORTRAN, and a MPI_Comm object in C.
Intermediate MPI
22 / 89
Introduction
MPI Fundamentals
MPI Datatypes
MPI denes its own datatypes that correspond to typical datatypes in C and FORTRAN. Allows for automatic translation between different representations in a heterogeneous parallel environment. You can build your own datatypes from the basic MPI building blocks. Actual representation is implementation dependent. Convention: program variables are usually declared as normal C or FORTRAN types, and then calls to MPI routines use MPI type names as needed.
Intermediate MPI
23 / 89
Introduction
MPI Fundamentals
MPI Datatypes in C
In C, the basic datatypes (and their ISO C equivalents) are:
MPI Datatype MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE MPI_INT MPI_LONG MPI_SHORT MPI_UNSIGNED_SHORT MPI_UNSIGNED_LONG MPI_UNSIGNED MPI_CHAR MPI_UNSIGNED_CHAR MPI_BYTE MPI_PACKED C Type oat double long double signed int signed long int signed short int unsigned short int unsigned long int unsigned int signed char unsigned char
Intermediate MPI
24 / 89
Introduction
MPI Fundamentals
Intermediate MPI
25 / 89
Introduction
MPI Fundamentals
FORTRAN:
1 2 3 integer i e r r c a l l MPI_INIT ( i e r r ) ...
Intermediate MPI
26 / 89
Introduction
MPI Fundamentals
MPI Communicators
Denition (MPI Communicator) A communicator is a group of processors that can communicate with each other. There can be many communicators A given processor can be a member of multiple communicators. Within a communicator, the rank of a processor is the number (starting at 0) uniquely identifying it within that communicator.
Intermediate MPI
27 / 89
Introduction
MPI Fundamentals
A processors rank is used to specify source and destination in message passing calls. A processors rank can be different in different communicators. MPI_COMM_WORLD is a pre-dened communicator encompassing all of the processes. Additional communicators can be dened to dene subsets of this group.
Intermediate MPI
28 / 89
Introduction
MPI Fundamentals
FORTRAN:
1 2 MPI_COMM_RANK(comm, rank , i e r r ) MPI_COMM_SIZE (comm, s i z e , i e r r )
where rank and size are integers returned with (obviously) the rank and extent (0:number of processors-1).
Intermediate MPI
29 / 89
Introduction
MPI Fundamentals
We have already covered enough material to write the simplest of MPI programs: here is one in C:
1 2 3 4 5 6 7 8 9 10 11 12 13 # include < s t d i o . h> # include " mpi . h " i n t main ( i n t argc , char argv ) { i n t i e r r , myid , numprocs ; M P I _ I n i t (& argc ,& argv ) ; MPI_Comm_size (MPI_COMM_WORLD,& numprocs ) ; MPI_Comm_rank (MPI_COMM_WORLD,& myid ) ; p r i n t f ( " H e l l o World , I am Process %d o f %d \ n " , myid , numprocs ) ; MPI_Finalize ( ) ; }
Intermediate MPI
30 / 89
Introduction
MPI Fundamentals
Many MPI codes can get away with using only the six most frequently used routines: MPI_INIT for intialization MPI_COMM_SIZE size of communicator MPI_COMM_RANK rank in communicator MPI_SEND send message MPI_RECEIVE receive message MPI_FINALIZE shut down communicator
Intermediate MPI
31 / 89
Basic features: In MPI 1.2, only two-sided communications are allowed, requiring an explicit send and receive. (2.0 allows for one-sided communications, i.e. get and put). Point-to-point (or P2P) communication is explicitly two-sided, and the message will not be sent without the active participation of both processes. A message generically consists of an envelope (tags indicating source and destination) and a body (data being transferred). Fundamental - almost all of the MPI comms are built around point-to-point operations.
Intermediate MPI
33 / 89
Message Bodies
buffer: the starting location in memory where the data is to be found. C: actual address of an array element FORTRAN: name of the array element datatype: the type of data to be sent. Commonly one of the predened types, e.g. MPI_REAL. Can also be a user dened datatype, allowing great exibility in dening message content for more advanced applications. count: number of items being sent.
MPI standardizes the elementary datatypes, avoiding having the developer have to worry about numerical representation.
Intermediate MPI
34 / 89
Message Envelopes
MPI message wrappers have the following general attributes: communicator - the group of processes to which the sending and receiving process belong. source - originating process destination - receiving process tag - message identier, allows program to label classes of messages (e.g. one for name data, another for place data, status, etc.)
Intermediate MPI
35 / 89
Message Envelopes
blocking routine does not return until operation is complete. blocking sends, for example, ensure that it is safe to overwrite the sent data. blocking receives, the data is here and ready for use. nonblocking routine returns immediately, with no info about completion. Can test later for success/failure of operation. In the interim, the process is free to go on to other tasks.
Intermediate MPI
36 / 89
Send Modes
Point-to-point Semantics
For MPI sends, there are four available modes: standard - no guarantee that the receive has started. synchronous - complete when receipt has been acknowledged. buffered - complete when data has been copied to local buffer. No implication about receipt. ready - the user asserts that the matching receive has been posted (allows user to gain performance). MPI receives are easier - they are complete when the data has arrived and is ready for use.
Intermediate MPI
37 / 89
Blocking Send
MPI_SEND MPI_SEND(buff,count,datatype,dest,tag,comm) buff (IN), initial address of message buffer count (IN), number of entries to send (int) datatype (IN), datatype of each entry (handle) dest (IN), rank of destination (int) tag (IN), message tag (int) comm (IN), communicator (handle)
Intermediate MPI
38 / 89
Blocking Receive
MPI_RECV MPI_RECV(buff,count,datatype,source,tag,comm, status) buff (IN), intial address of message buffer count (IN), number of entries to send (int) datatype (IN), datatype of each entry (handle) source (IN), rank of source (int) tag (IN), message tag (int) comm (IN), communicator (handle) status (OUT), return status (Status)
Intermediate MPI
39 / 89
source, tag, and comm must match those of a pending message for the message to be received. Wildcards can be used for source and tag, but not communicator. An error will be returned if the message buffer exceeds that allowed for by the receive. It is the users responsibility to ensure that the send/receive datatypes agree - if they do not, the results are undened.
Intermediate MPI
40 / 89
Status of a Receive
More information about message reception is available by examining the status returned by the call to MPI_RECV. C: status is a structure of type MPI_STATUS that contains at minimum the three elds:
1 2 3
FORTRAN: status is an integer array of length MPI_STATUS_SIZE. MPI_SOURCE, MPI_TAG, and MPI_ERROR are indices of entries that store the source, tag, and error elds.
Intermediate MPI
41 / 89
MPI_GET_COUNT
The routine MPI_GET_COUNT is an auxiliary routine that allows you to test the amount of data received: MPI_GET_COUNT MPI_GET_COUNT(status,datatype,count) status (IN), return status of receive (Status) datatype (IN), datatype of each receive buffer entry (handle) count (OUT), number of entries received (int) MPI_UNDEFINED will be returned in the event of an error.
Intermediate MPI
42 / 89
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
# include < s t d i o . h> # include " mpi . h " i n t main ( i n t argc , char argv ) { i n t i , i e r r , rank , s i z e , dest , source , from , to , count , t a g ; i n t stat_count , stat_source , stat_tag ; f l o a t data [ 1 0 0 ] ; MPI_Status s t a t u s ; M P I _ I n i t (& argc ,& argv ) ; MPI_Comm_rank (MPI_COMM_WORLD, &rank ) ; MPI_Comm_size (MPI_COMM_WORLD, &s i z e ) ; p r i n t f ( " I am process %d o f %d \ n " , rank , s i z e ) ; d e s t = s i z e 1; source =0; i f ( rank == source ) { / I n i t i a l i z e and Send Data / to = dest ; count = 100; tag = 11; f o r ( i =0; i <=99; i ++) data [ i ] = i ; i e r r = MPI_Send ( data , count , MPI_REAL , to , tag ,MPI_COMM_WORLD ) ; }
Intermediate MPI
43 / 89
23 24 25 26 27 28 29 30 31 32 33 34 35 36
else i f ( rank == d e s t ) { / Receive & Check Data / t a g = MPI_ANY_TAG ; / w i l d c a r d / count = 100; from = MPI_ANY_SOURCE ; / a n o t h e r w i l d c a r d / i e r r = MPI_Recv ( data , count , MPI_REAL , from , tag ,MPI_COMM_WORLD,& s t a t u s ) ; i e r r = MPI_Get_count (& s t a t u s , MPI_REAL,& s t a t _ c o u n t ) ; s t a t _ s o u r c e = s t a t u s . MPI_SOURCE ; s t a t _ t a g = s t a t u s . MPI_TAG ; p r i n t f ( " S t a t u s o f r e c e i v e : d e s t=%d , source=%d , t a g=%d , count=%d \ n " , rank , s t a t _ s o u r c e , s t a t _ t a g , s t a t _ c o u n t ) ; } i e r r = MPI_Finalize ( ) ; return 0; }
Intermediate MPI
44 / 89
For MPI_RECV completion is easy - the data is here, and can now be used. A bit trickier for MPI_SEND - completes when the data has been stored away such that the program is free to overwrite the send buffer. It can be non-local - the data could be copied directly to the receive buffer, or it could be stored in a local buffer, in which case the send could return before the receive is initiated (thereby allowing even a single threaded send process to continue).
Intermediate MPI
45 / 89
Perils of Buffering
Message Buffering
Decouples send/receive operations. Entails added memory-memory copying (additional overhead) Amount of buffering is application and implementation dependent:
applications can choose communication modes - and gain ner control (with additional hazards) over messaging behavior. the standard mode is implementation dependent
Intermediate MPI
46 / 89
Perils of Buffering
A properly coded program will not fail if the buffer throttles back on the sends, thereby causing blocking (imagine the assembly line controlled by the rate at which the nal inspector signs off on each item). An improperly coded program can deadlock ...
Intermediate MPI
47 / 89
Perils of Buffering
Deadlock
safe MPI programs do not rely on system buffering for success. Any system will eventually run out of buffer space as message buffer sizes are increased. Users are free to take advantage of knowledge of an implementations buffering policy to increase performance, but they do so by relaxing the margin for safety (as well as decreasing portability, of course).
Intermediate MPI
48 / 89
Perils of Buffering
Deadlock Examples
Intermediate MPI
49 / 89
Perils of Buffering
Intermediate MPI
50 / 89
Perils of Buffering
Buffering dependent:
1 2 3 4 5 6 7 8 CALL MPI_COMM_RANK(comm, rank I F ( rank . eq . 0 ) THEN CALL MPI_SEND( s b u f f , count CALL MPI_RECV( r b u f f , count ELSE I F ( rank . eq . 1 ) THEN CALL MPI_SEND( s b u f f , count CALL MPI_RECV( r b u f f , count END I F , ierr ) , MPI_REAL , 1 , tag , comm, i e r r ) , MPI_REAL , 1 , tag , comm, status , i e r r ) , MPI_REAL , 0 , tag , comm, i e r r ) , MPI_REAL , 0 , tag , comm, status , i e r r )
for this last buffer-dependent example, one of the sends must buffer and return - if the buffer can not hold count reals, deadlock occurs. Non-blocking communications can be used to avoid buffering, and possibly increase performance.
Intermediate MPI
51 / 89
Advantages: 1 Easier to write code that doesnt deadlock 2 Can mask latency in high latency environments by posting receives early (requires a careful attention to detail). Disadvantages: 1 Makes code quite a bit more complex. 2 Harder to debug and maintain code.
Intermediate MPI
52 / 89
MPI_ISEND MPI_ISEND(buff,count,datatype,dest,tag,comm, request) buff (IN), intial address of message buffer count (IN), number of entries to send (int) datatype (IN), datatype of each entry (handle) dest (IN), rank of destination (int) tag (IN), message tag (int) comm (IN), communicator (handle) request (OUT), request handle (handle)
Intermediate MPI
53 / 89
MPI_IRECV MPI_IRECV(buff,count,datatype,dest,tag,comm, request) buff (OUT), intial address of message buffer count (IN), number of entries to send (int) datatype (IN), datatype of each entry (handle) dest (IN), rank of destination (int) tag (IN), message tag (int) comm (IN), communicator (handle) request (OUT), request handle (handle)
Intermediate MPI
54 / 89
The request handle is used to query the status of the communication or to wait for its completion. The user must not overwrite the send buffer until the send is complete, nor use elements of the receiving buffer before the receive is complete (intuitively obvious, but worth stating explicitly).
Intermediate MPI
55 / 89
MPI_WAIT MPI_WAIT(request,status) request (INOUT), request handle (handle) status (OUT), status object (status) MPI_TEST MPI_TEST(request,flag,status) request (INOUT), request handle (handle) ag (OUT), true if operation complete (logical) status (OUT), status status object (Status)
Intermediate MPI
56 / 89
The request handle should identify a previously posted send or receive MPI_WAIT returns when the operation is complete, and the status is returned for a receive (for a send, may contain a separate error code for the send operation). MPI_TEST returns immediately, with ag = true if posted operation corresponding to the request handle is complete (and status output similar to MPI_WAIT).
Intermediate MPI
57 / 89
Intermediate MPI
58 / 89
standard - used thus far, implementation dependent choice of asynchronous buffer transfer, or synchronous direct transfer. (rationale - MPI makes a better low-level choice) synchronous - synchronize sending and receiving process. when a synchronous send is completed, the user can assume that the receive has begun. ready - matching receive has already been posted, else the result is undened. Can save time and overhead, but requires a very precise knowledge of algorithm and its execution. buffered - force buffering - user is also responsible for maintaining the buffer. Result is undened if buffer is insufcient. (see MPI_BUFFER_ATTACH and MPI_BUFFER_DETACH).
Intermediate MPI
59 / 89
Intermediate MPI
60 / 89
Collective Communications
Intermediate MPI
62 / 89
Collective Communications
Barrier
Barrier Synchronization
A very simple MPI routine provides the ability to block the calling process until all processes have called it: MPI_BARRIER MPI_BARRIER( comm ) comm (IN), communicator (handle) returns only when all group members have entered the call.
Intermediate MPI
63 / 89
Collective Communications
Broadcast
Intermediate MPI
64 / 89
Collective Communications
Broadcast
Broadcast
MPI_BCAST MPI_BCAST(buffer,count,datatype,root,comm) buffer (INOUT), starting address of buffer (choice) count (IN), number of entries in buffer (int) datatype (IN), data type of buffer (handle) root (IN), rank of broadcasting process (int) comm (IN), communicator (handle)
Intermediate MPI
65 / 89
Collective Communications
Broadcast
Broadcast Details
broadcast a message from the process to all members of the group (including itself). root must have identical value on all processes. comm must be the same intra-group domain.
Intermediate MPI
66 / 89
Collective Communications
Gather
Intermediate MPI
67 / 89
Collective Communications
Gather
Gather
MPI_GATHER
MPI_GATHER(sendbuffer, sendcount, sendtype,recvbuffer,recvcount, recvtype,root,comm) sendbuffer (IN), starting address of send buffer (choice) sendcount (IN), number of entries in send buffer (int) sendtype (IN), data type of send buffer (handle) recvbuffer (OUT), starting address of receive buffer (choice) recvcount (IN), number of entries any single receive (int) recvtype (IN), data type of receive buffer elements (handle) root (IN), rank of receiving process (int) comm (IN), communicator (handle)
Intermediate MPI
68 / 89
Collective Communications
Gather
Gather Details
each process sends contents of send buffer to root. root stores receives in rank order (as if there were N posted receives of sends from each process).
Intermediate MPI
69 / 89
Collective Communications
Scatter
Scatter
MPI_SCATTER
MPI_SCATTER( sendbuffer, sendcount, sendtype, recvbuffer, recvcount,recvtype, root, comm) sendbuffer (IN), starting address of send buffer (choice) sendcount (IN), number of entries sent to each process (int) sendtype (IN), data type of send buffer elements (handle) recvbuffer (OUT), starting address of receive buffer (choice) recvcount (IN), number of entries any single receive (int) recvtype (IN), data type of receive buffer elements (handle) root (IN), rank of receiving process (int) comm (IN), communicator (handle)
Intermediate MPI
70 / 89
Collective Communications
Scatter
Scatter Details
basically the reverse operation to MPI_GATHER. a one-to-all operation in which each recipient get a different chunk.
Intermediate MPI
71 / 89
Collective Communications
Scatter
Gather Example
1 2 3 4 5 6 7 8 9 10
MPI_Comm comm; i n t myrank , nprocs , r o o t , i a r r a y [ 1 0 0 ] ; ... MPI_Comm_rank (comm,& myrank ) ; i f ( myrank == r o o t ) { MPI_Comm_size (comm,& nprocs ) ; r b u f f = ( i n t ) m a l l o c ( nprocs100 s i z e o f ( i n t ) ) ; } MPI_Gather ( i a r r a y , 1 0 0 , MPI_INT , r b u f , 1 0 0 , MPI_INT , r o o t ,comm ) ; ...
Intermediate MPI
72 / 89
Collective Communications
Reduction
Reduction
MPI_REDUCE MPI_REDUCE( sendbuffer, recvbuffer, count, datatype, op, root, comm) sendbuffer (IN), starting address of send buffer (choice) recvbuffer (OUT), starting address of receive buffer (choice) count (IN), number of entries in buffer (int) datatype (IN), data type of buffer (handle) op (IN), reduce operation (handle) root (IN), rank of broadcasting process (int) comm (IN), communicator (handle)
Intermediate MPI
73 / 89
Collective Communications
Reduction
Reduce Details
combine elements provided in sendbuffer of each process and use op to return combined value in recvbuffer of root process.
Intermediate MPI
74 / 89
Collective Communications
Reduction
MPI_MAX MPI_MIN MPI_SUM MPI_PROD MPI_LAND MPI_BAND MPI_LOR MPI_BOR MPI_LXOR MPI_BXOR MPI_MINLOC MPI_MAXLOC
maximum minimum sum product logical and bit-wise and logical or bit-wise or logical xor bit-wise xor min value and location max value and location
Intermediate MPI
75 / 89
Collective Communications
More Variations
Process Startup
Process Startup
Single most confusing aspect of MPI for most new users Implementation dependent! with many implementation specic options, ags, etc. Consult the documentation for the MPI implementation that you are using.
Intermediate MPI
78 / 89
Process Startup
Intermediate MPI
79 / 89
Process Startup
Intermediate MPI
80 / 89
Inquiry Routines
MPI_GET_VERSION MPI_GET_VERSION(version,subversion) version (OUT), version number (int) subversion (OUT), subversion number (int) Not exactly critical for programming, but a nice function for determining what version of MPI you are using (especially when the documentation for your machine is poor).
Intermediate MPI
81 / 89
Inquiry Routines
Where am I running?
MPI_GET_PROCESSOR_NAME MPI_GET_PROCESSOR_NAME(name, resultlen) name (OUT), A unique specier for the actual node (string) resultlem (OUT), Length (in printable chars) of the reslut in name (int) returns the name of the processor on which it was called at the moment of the call. name should have storage that is at least MPI_MAX_PROCESSOR_NAME characters long.
Intermediate MPI
82 / 89
MPI_WTIME MPI_WTIME() double precision value returned representing elapsed wall clock time from some point in the past (origin guaranteed not to change during process execution time). A portable timing function (try nding another!) - can be high resolution, provided it has some hardware support.
Intermediate MPI
83 / 89
Testing the resolution of MPI_WTIME: MPI_WTICK MPI_WTICK() double precision value returned which is the resolution of MPI_WTIME in seconds. hardware dependent, of course - if a high resolution timer is available, it should be accessible through MPI_WTIME.
Intermediate MPI
84 / 89
Intermediate MPI
85 / 89
Intermediate MPI
86 / 89
Proling
The MPI proling interface is designed for authors of proling tools, such that they will not need access to a particular implementations source code (which a vendor may not wish to release). Many proling tools exist:
1
2 3 4
Vampir (Intel, formerly Pallas), now called Intel Trace Analyzer and Visualizer HPMCount (IBM AIX) jumpshot (MPICH) SpeedShop, cvperf (SGI)
Intermediate MPI
88 / 89
Proling
Advanced MPI topics not covered thus far: User dened data types Communicators and Groups Process Topologies MPI-2 Features
MPI-I/O Dynamic process management (MPI_Spawn) One-sided communications (get/put)
Intermediate MPI
89 / 89