0% found this document useful (0 votes)

28 views53 pages

Lecture 12-MPI Collective Communication

The document covers the concepts of MPI (Message Passing Interface) blocking and nonblocking point-to-point communication, highlighting the differences between MPI_Send/MPI_Recv and their nonblocking counterparts MPI_Isend/MPI_Irecv. It discusses the importance of avoiding deadlocks, the use of request handles for nonblocking calls, and the rules for successful communication. Additionally, it introduces collective communication operations such as broadcast, scatter, gather, and reduction, along with their respective functions and usage in parallel programming.

Uploaded by

roarsomebros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views53 pages

Lecture 12-MPI Collective Communication

Uploaded by

roarsomebros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Applied High-Performance Computing and Parallel

Programming

Presenter: Liangqiong Qu

Assistant Professor

The University of Hong Kong

Review of Lecture 11: MPI Blocking Point-to-Point Communication
▪ MPI_Send
• MPI_Send would hang until the whole message has arrived the receiver or the whole message has
been copied into a system buffer.
• After MPI_Send, the send buffer is save for reuse.

▪ MPI_Recv
• MPI_Recv would hang until the message has been received into the buffer specified by the buffer
argument.
• If message is not available, the process will remain hang until a message become available.

Network

System Buffer

Send Buffer Receive Buffer

Review of Lecture 11: MPI Blocking Point-to-Point Communication

▪ Both ranks wait for Receive to get called. A deadlock occurs when two or more
processors try to access the same set of resources.

▪ Possible solutions to avoid deadlock:

• Different ordering of send and receive, but not symmetric and does not scale
• Using nonblocking point-to-point communication
Review of Lecture 11: Nonblocking Point-to-Point Communication
▪ A nonblocking MPI call returns immediately to the next statement without waiting
for task to complete, whereas a blocking send will return after the data has been copied
out of the sender memory.
▪ Donot reuse buff before MPI nonblocking call has been completed. Return of call does not
imply communication completion, check for completion via MPI_Wait*/MPI_Test*

▪ MPI_Request is a handle to an hidden request object that holds detailed information about
the transaction. The request handle can be used for subsequent Wait* and Test* calls.
▪ MPI_Irecv has no status argument
Review of Lecture 11: Nonblocking Point-to-Point Communication
▪ All nonblocking calls in MPI return a request handle in lieu of a status variable.
▪ MPI provides two functions to complete a nonblocking communication call.
• MPI_Wait: Waiting forces the process to go in "blocking mode". The sending
process will simply wait for the request to finish. If your process waits right
after MPI_Isend, the send is the same as calling MPI_Send.

• MPI_Test: Testing checks if the request can be completed. If it can, the request is
automatically completed and the data transferred.
Outline

▪ Point-to-Point and Collective Communication with MPI

▪ MPI Nonblocking Point-to-Point Communication

▪ Collective Communication
• Synchronization (barrier)
• Data movement (broadcast, scatter, gather, all to all)
• Global computation

▪ Examples
Example of Nonblocking Point-to-Point Communication

• Objective: 1) Do nonblocking point-to-point

communication between rank 0 and rank 1
2) Transfer the vector buffer in rank 0 with vector size
BUFFER_COUNT to rank 1.
3) The buffer in rank 0 is initialized as [0, 2, 4, …18]
Example of Nonblocking Point-to-Point Communication
Example of Nonblocking Point-to-Point Communication (Core Code)

• Rank 0 initializes the Isend. The

Isend will return & not wait for
completion of communication.
• The send rank 0 starts working
immediately with the return of
MPI_ISend

• Work for 1ms, then check

whether the receiver is ready,
using MPI_Test. If the returned
flag is !0, the request is
completed.

• If the receiver is still not ready

after rank 0 has worked for 6ms,
the program switches to
blocking mode using
`MPI_Wait`
Example of Nonblocking Point-to-Point Communication

Submit the job to the slurm

• Batch script
Output results of the submitted jobs
Summarization of Nonblocking Point-to-Point Communication
• Blocking vs. nonblocking: MPI_Send()/MPI_Recv() blocks until data is received or
copied out to the system buff; A nonblocking MPI call returns immediately to the
next statement without waiting for communication to complete.
• Standard nonblocking for send and recv is MPI_Isend() and MPI_Irecv()
• Return of call does not imply completion of communication
• Use MPI_Wait*() / MPI_Test*() to check for completion using request handles
• Potentials
• Enabling overlapping between communication & computation
• Avoiding certain deadlocks
• Avoiding synchronization and idle times

• Caveat: Compiler does not know about asynchronous modification of data

Blocking and Nonblocking Point-to-Point Communication
▪ For a communication to succeed:
• The sender must specify a valid destination.
• The receiver must specify a valid source rank (or MPI_ANY_SOURCE).
• The communicator used by the sender and receiver must be the same (e.g.,
MPI_COMM_WORLD).
• The tags specified by the sender and receiver must match (or MPI_ANY_TAG
for receiver).
• The data types of the messages being sent and received must match.
• The receiver's buffer must be large enough to hold the received message.
Collective Communication
Review of Lecture 10---Parallel Execution in MPI
• Processes run throughout program execution

Program startup
• MPI start mechanism:
• Launches tasks/processes
• Establishes communication context (“communicator”)

• MPI point-to-point communication +

• between pairs of tasks/processes
• MPI collective communication:
• between all processes or a subgroup

• Clean shutdown by MPI Program shutdown

Collective Communication in MPI
• Collective communication allows you to exchange data among a group of
processes

• It must involve all processes in the scope of a communicator

1 3

0 4

source 2
5 e.g. MPI_Bcast
Communicator
Collective Communication in MPI
• Collective communication allows you to exchange data among a group
of processes

• It must involve all processes in the scope of a communicator

• It consists of:
• Blocking variants: The call would hang until the message has arrived
the receiver or been copied into a system buffer. Buffer can be reused
after return

• Nonblocking variants: A nonblocking call return immediately to the

next statement without waiting for task to complete. But buffer can only
be used after completion (MPI_Wait*, MPI_Test*).
Collective Communication in MPI
▪ Rules for all collectives
• Data type matching
• Do not use tags
• Count must be exact, i.e., there is only one message length, buffer
must be large enough

▪ Different types of communication:

• Synchronization (barrier)
• Data movement (broadcast, scatter, gather)
• Global computation (reduction, scan)
• Combinations of data movement and computation (reduction +
broadcast)
Review of Barrier in OpenMP

#pragma omp barrier

• Each thread blocks upon reaching the barrier

until all threads have reached the barrier
• All accessible shared variables are flushed to
the memory hierarchy
• barrier may not appear within work-sharing
construct → potential of deadlock or
unanticipated results
Synchronization: Barrier

▪ Explicit synchronization of all ranks from

specified communicator
• int MPI_Barrier(MPI_Comm comm)

▪ Any process calling it will be blocked until all the

processes within the group have called it. Once all
the processes in the communicator group have
reached the barrier, the function will return, and
all processes in the group can continue.
From: https://sites.cs.ucsb.edu/~tyang/class/240a17/slides
Collective Data Movements
▪ MPI provides three categories of collective data-movement routines in which one
process either sends to or receives from all processes: broadcast, gather, and scatter.

broadcast

scatter gather
Collective Data Movements: MPI_ Bcast
▪ Broadcasting happens when one process wants to send the same
information to every other process. It sends buffer contents from one
rank (called the “root”) to all ranks in the communicator.

▪ No restrictions on which rank is root

broadcast

MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm)

• buffer [send/recv] the address of the buffer.

• count the number of elements sent to each process.
• datatype: MPI datatype
• root an integer indicating the rank of broadcast root process
• comm the communicator
Collective Data Movements: MPI_ Bcast
▪ Broadcasting happens when one process wants to send the same
information to every other process. Send buffer contents from one
rank (“root”) to all ranks

▪ No restrictions on which rank is root (usually rank 0)

broadcast

MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm)
Collective Data Movements: MPI_ Bcast
▪ Broadcasting happens when one process wants to send the same
information to every other process. Send buffer contents from one
rank (“root”) to all ranks

▪ No restrictions on which rank is root (usually rank 0)

MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm)

Common mistake
if (I am master) then
MPI_Bcast (buff,…,0,MPI_COMM_WORLD)
else
MPI_Recv(buff, … ,0, MPI_COMM_WORLD)
endif
Collective Data Movements: MPI_ Bcast
• Defining variables: source indicate the
root rank who initiates the broadcast
MPI collective function, no receiving
identity is required

• Defining variables: We will send four

integers in the broadcast and need to
define a buffer for sending and a
buffer for receiving four integers (!)

• The MPI_Bcast() function broadcasts a

message from the process with rank root
to all processes of the group, it self
included (!)

• All processes should print out their

buffer, also for those ranks that did not
initialize the buffer. Originally only
rank 0 is initialized.
Collective Data Movements: MPI_ Bcast
Collective Data Movements: MPI_Scatter
▪ Scatter: Distributes distinct messages from a single root rank to each
ranks in the communicator. Given communicator with n ranks,
distribute the data into n equal segments, where the i-th segment is
sent to the i-th process in the communicator

scatter

Example: the scattering operation distributes evenly a set of data over all the processes of a
communicator. (From: https://www.codingame.com/playgrounds/349/introduction-to-mpi)
Collective Data Movements: MPI_Scatter
▪ Scatter: Distributes distinct messages from a single root rank to each ranks in the
communicator.

MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype sendstype,

void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm)

• sendbuf is the address of the send buffer that ONE process will dispatch to all the
other processes
• recvbuf is the address of the receive buffer
• sendcount is the number of elements the process will send to other process
• root is the rank of the process that will be sending its data

▪ In general, sendcount = recvcount

• This is the length of the segment
• It is not the length of the message, but the length of each segment
Collective Data Movements: MPI_Scatter
rank
rank 0 1 2 3

sendbuf

recvbuf

MPI_Scatter(sendbuf, 1, MPI_INT, recvbuf, 1, MPI_INT, root, MPI_COMM_WORLD)

Collective Data Movements: MPI_Scatter
rank
rank 0 1 2 3

sendbuf

recvbuf

MPI_Scatter(sendbuf, 1, MPI_INT, recvbuf, 1, MPI_INT, root, MPI_COMM_WORLD)

sendbuf

recvbuf

Note the count here is not the length of the message, but the length of each segment
Collective Data Movements: MPI_Gather
▪ Receive a message from each rank and place i-th rank’s message at i-th position in
receive buffer

int MPI_Gather(void *sendbuf, int sendcount, MPI_Datatype sendtype,

void *recvbuf, int recvcount, MPI_Datatype recvtype,
int root, MPI_Comm comm )

• sendcount is the number of elements in the send

• recvcount is the number of elements for any single receive
• root is the rank of the receiving process

▪ In general, sendcount = recvcount gather

▪ recvbuf is ignored on non-root ranks because there is nothing to receive
Collective Data Movements: MPI_Gather

rank

sendbuf

recvbuf

MPI_Gather(sendbuf, 1, MPI_INT, recvbuf, 1, MPI_INT, root, MPI_COMM_WORLD)

sendbuf

recvbuf
Example Usage of Collective Data Movements
▪ Matrix-vector multiplication task: Develop a parallel MPI program using collective
functions to perform matrix-vector multiplication on a 100x100 matrix A and a vector b
of length 100. The initial data for matrix A and vector b reside on processor P, and the
program should utilize four processors, including processor P, to execute the computation
in parallel.

Fig. 1 Matrix-vector multiplication.

Thank you for taking this course!
Please help take a SSCC survey at moodle
Please complete the online survey by 16 March (23:55)
2025 via Moodle or use the following link

https://hku.au1.qualtrics.com/jfe/form/SV_6sQBDuNUKN
8Gbbg
A file containing the responses in Excel/PDF format will be sent to the Class Representative. These responses
will be analyzed and discussed during the Staff-Student Consultative Committee meeting on 26 March 2025
(Wednesday).
Example Usage of Collective Data Movements
▪ Matrix-vector multiplication task: Develop a parallel MPI program using collective
functions to perform matrix-vector multiplication on a 100x100 matrix A and a vector b
of length 100. The initial data for matrix A and vector b reside on processor P, and the
program should utilize four processors, including processor P, to execute the computation
in parallel.
Concept:
• Matrix is distributed by rows (i.e., row-major order)
• Product vector is needed in entirety by every process
• MPI_Gather will be used to collect the product from Fig. 1 Matrix-vector multiplication.
processes

• A: a matrix partitioned across rows and distributed to

processes as Apart
• b: a vector present on all processes
• c: a partitioned vector updated by each process independently
Example Usage of Collective Data Movements: Code
▪ Matrix-vector multiplication task: Develop a parallel MPI program using collective
functions to perform matrix-vector multiplication on a 100x100 matrix A and a vector b
of length 100.
Example Usage of Collective Data Movements: Code-Continued
▪ Matrix-vector multiplication task: Develop a parallel MPI program using collective
functions to perform matrix-vector multiplication on a 100x100 matrix A and a vector b
of length 100.
Example Usage of Collective Data Movements: Code-Continued
▪ Matrix-vector multiplication task: Develop a parallel MPI program using collective
functions to perform matrix-vector multiplication on a 100x100 matrix A and a vector b
of length 100.
Example Usage of Collective Data Movements
▪ Matrix-vector multiplication task: Develop a parallel MPI program using collective
functions to perform matrix-vector multiplication on a 100x100 matrix A and a vector b
of length 100.
Global Computation
Global Computation: MPI_Reduce
▪ MPI_Reduce: Collective computation operation. Applies a reduction operation on all tasks
in communicator and places the result in root rank.

MPI_reduce( void sendbuf, void recvbuf, int count, MPI_Datatype datatype,

MPI_Op op, int root, MPI_Comm comm );

▪ MPI_Op op here indicates the reduce operation (MPI predefined or your own)
▪ count indicates the number of elements in send buffer (integer)
▪ Result in recvbuf only available on root process
▪ Perform operation on all count elements of an array
▪ If all ranks require result, use MPI_Allreduce(), by not specifying the root rank.
MPI_Allreduce( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype,
MPI_Op op, MPI_Comm comm );
Global Computation: MPI_Reduce
▪ MPI_Reduce: Collective computation operation. Applies a reduction operation on all tasks
in communicator and places the result in root rank.
MPI_reduce( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype,
MPI_Op op, int root, MPI_Comm comm );

▪ MPI_Op op here indicates the reduction operation (MPI predefined or your own)
▪ count indicates the number of elements in send buffer (integer)

rank 0 rank 1 rank 2 rank 3

MPI_reduce(sendbuf, recvbuf, 1, MPI_MPI_INT,

MPI_SUM, 1, MPI_COMM_WORLD );
Predefined operators in MPI
MPI OP Operation
MPI_MAX Maximum
MPI_MIN Minimum ▪ If the 12 predefined ops are not enough
use MPI_Op_create/MPI_Op_free to
MPI_SUM Sum
create own ones.
MPI_PROD Product
MPI_LAND Logical AND
MPI_BAND Bitwise AND
MPI_LOR Logical OR
MPI_BOR Bitwise OR
MPI_LXOR Logical exclusive OR
MPI_BXOR Bitwise exclusive OR
MPI_MAXLOC Max val and location
MPI_MINLOC Min val and location
Review of Lecture 11---Example 2. Parallel Integration in MPI
Task: calculate in parallel, using 4 processors with the (existing)
intergrate(x, y). Let a = 0, b =2, and f(x) = x2

• Prerequisite knowledge of Trapezoidal Rule in C language for integration

Review of Lecture 11 ---Example 2. Parallel Integration in MPI
Task: calculate in parallel,
using 4 processors, let a = 0, b =2
• Split up interval [a, b] into equal
disjoint chunks
• Compute partial results in parallel
• Collect global sum at rank 0
Parallel Integration in MPI
Task: calculate in parallel,
using 4 processors, let a = 0, b =2
• Now let’s simplify the
implementation using collective
communication
• Split up interval [a, b] into equal
disjoint chunks
• Compute partial results in parallel
• Collect global sum at rank 0 using
MPI_Reduce
Global Computation: MPI_Scan
▪ MPI_Scan: Performs a prefix reduction of the data stored in sendbuf at each process and
returns the results in recvbuf of the process with rank dest.

MPI_Scan(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op,
MPI_Comm comm )
Global Computation: MPI_Scan
▪ MPI_Scan: Performs a prefix reduction of the data stored in sendbuf at each process and
returns the results in recvbuf of the process with rank dest.

MPI_Scan(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op,
MPI_Comm comm )

MPI_Scan(sendbuf, recvbuf, 4, MPI_MPI_INT,

MPI_SUM, 4, MPI_COMM_WORLD );
Nonblocking Collective Communication
▪ A non-blocking call has the same syntax as its blocking counterpart, with two differences:
• The letter I (think of "initiate" or "immediate") appears in the name of the call,
immediately following the first underscore: e.g., MPI_Ibcast vs MPI_bcast
• The final argument is a handle to an opaque (or hidden) request object that holds
detailed information about the transaction. The request handle can be used for
subsequent Wait and Test calls.

MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype sendstype,

void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm)

MPI_Iscatter(void *sendbuf, int sendcount, MPI_Datatype sendstype,

void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm,
MPI_Request *request)
Nonblocking Collective Communication

What happened with this

code?
Nonblocking Collective Communication
• Bad example that we should not
do

• We have induced some chaotic

behavior by the second invocation
of strcpy
• Donot use the buffer before the call
completed by checking with
MPI_Wait/Test

• Every nonblocking call in MPI

should be completed with a
matching call to MPI_Wait,
MPI_Test
Nonblocking Collective Communication
• Every nonblocking call in
MPI should be completed
with a matching call to
MPI_Wait or MPI_Test
Summarization of MPI Collective Communication

▪ MPI collectives
• All ranks in communicator must call
the function
▪ Types:
• Synchronization (barrier) scatter broadcast

• Data movement (broadcast, scatter,

gather, all to all)
• Global computation (reduction, scan)
• Combinations of data movement and
computation (reduction + broadcast) gather
Thank you very much for choosing this course!

Give us your feedback!

https://forms.gle/zDdrPGCkN7ef3UG5A

MiniTool Partition Wizard Crack 12 Key Download Free 2025
No ratings yet
MiniTool Partition Wizard Crack 12 Key Download Free 2025
29 pages
Fluent Beta Features Manual
No ratings yet
Fluent Beta Features Manual
612 pages
BIg Data Anslysi
No ratings yet
BIg Data Anslysi
57 pages
Distributed Memory Programming Using
No ratings yet
Distributed Memory Programming Using
113 pages
UGC-NET Computer Science
100% (2)
UGC-NET Computer Science
8 pages
CS-3006 - 5 - MPI Basics
No ratings yet
CS-3006 - 5 - MPI Basics
53 pages
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
No ratings yet
Distributed-Memory Parallel Programming With MPI: Supervised By: Dr. Shaima Hagras
20 pages
Chapter 4 - Message-Passing Programming, MPI
No ratings yet
Chapter 4 - Message-Passing Programming, MPI
79 pages
2 Mpi
No ratings yet
2 Mpi
13 pages
Unit - 3 - My
No ratings yet
Unit - 3 - My
84 pages
Week 10
No ratings yet
Week 10
52 pages
PDC Lecture 16 MPI - Net-New
No ratings yet
PDC Lecture 16 MPI - Net-New
59 pages
Lecture 11 MPI Point To Point Communication
No ratings yet
Lecture 11 MPI Point To Point Communication
36 pages
Cloud Computing Answers
No ratings yet
Cloud Computing Answers
87 pages
5CS022 Lecture 2
No ratings yet
5CS022 Lecture 2
24 pages
Distributed Systems and Cloud Computing
No ratings yet
Distributed Systems and Cloud Computing
24 pages
Gas Turbine
No ratings yet
Gas Turbine
56 pages
PDC Lecture 17 & 18
No ratings yet
PDC Lecture 17 & 18
16 pages
ch5 MPI
No ratings yet
ch5 MPI
53 pages
Lecture 11 Distributed Memory Programming
No ratings yet
Lecture 11 Distributed Memory Programming
28 pages
CH 4
No ratings yet
CH 4
16 pages
4 P2P-1
No ratings yet
4 P2P-1
31 pages
Introduction MPI - Chap2 - Slide 3
No ratings yet
Introduction MPI - Chap2 - Slide 3
16 pages
HPC Day 11 PPT
No ratings yet
HPC Day 11 PPT
76 pages
CS8083 UNIT IV Notes
No ratings yet
CS8083 UNIT IV Notes
21 pages
Apznzayhh7i3gk6w Cuvwt6frekq7pgon 9ygvyqpxxizr06xwwpcj29m2cyf7srhmq5cu Hawkzm7cn8obps 9rbemjx43qoi2aixrppfxvlfp9nmwowtjlseuprpbxpttdeipr Rkq Zraxgwytizjexby1hzff8pkune92ywhrc Aez8ev7xemzlvd Qovivr9vkxanyei
No ratings yet
Apznzayhh7i3gk6w Cuvwt6frekq7pgon 9ygvyqpxxizr06xwwpcj29m2cyf7srhmq5cu Hawkzm7cn8obps 9rbemjx43qoi2aixrppfxvlfp9nmwowtjlseuprpbxpttdeipr Rkq Zraxgwytizjexby1hzff8pkune92ywhrc Aez8ev7xemzlvd Qovivr9vkxanyei
19 pages
Journal of Parallel and Distributed Computing
No ratings yet
Journal of Parallel and Distributed Computing
13 pages
Message Passing and MPI: John Mellor-Crummey
No ratings yet
Message Passing and MPI: John Mellor-Crummey
78 pages
WinCC Communication Manual 1 PDF
No ratings yet
WinCC Communication Manual 1 PDF
92 pages
Siemens Simatic HMI 2009 PDF
No ratings yet
Siemens Simatic HMI 2009 PDF
178 pages
Lap-Trinh-Song-Song - Pham-Quang-Dung - Chapter-Mpi - (Cuuduongthancong - Com)
No ratings yet
Lap-Trinh-Song-Song - Pham-Quang-Dung - Chapter-Mpi - (Cuuduongthancong - Com)
33 pages
Articles CAF HPC2 Published
No ratings yet
Articles CAF HPC2 Published
8 pages
Key Concepts in MPI Programming: Processes
No ratings yet
Key Concepts in MPI Programming: Processes
6 pages
Chapter Mpi
No ratings yet
Chapter Mpi
33 pages
CS6801-Multi Core Architectures and Programming
No ratings yet
CS6801-Multi Core Architectures and Programming
9 pages
Parallel Knoppix
No ratings yet
Parallel Knoppix
24 pages
Parallel Computing: MPI - Collective Communication
No ratings yet
Parallel Computing: MPI - Collective Communication
55 pages
Week 5 - The Impact of Multi-Core Computing On Computational Optimization
No ratings yet
Week 5 - The Impact of Multi-Core Computing On Computational Optimization
11 pages
Module 5
No ratings yet
Module 5
9 pages
Intro To MPI: Hpc-Support@duke - Edu
No ratings yet
Intro To MPI: Hpc-Support@duke - Edu
56 pages
Message Passing-1
No ratings yet
Message Passing-1
76 pages
Vipa Tia Portal Konfigurasyon
No ratings yet
Vipa Tia Portal Konfigurasyon
79 pages
Lisandro Dalcin Mpi4py
No ratings yet
Lisandro Dalcin Mpi4py
60 pages
F 3 D Solver Runner Debug Log Full
No ratings yet
F 3 D Solver Runner Debug Log Full
20 pages
HPC Concepts in Data Science
No ratings yet
HPC Concepts in Data Science
15 pages
HSPM
No ratings yet
HSPM
4 pages
Mpi 2
No ratings yet
Mpi 2
46 pages
Send and Receive
No ratings yet
Send and Receive
11 pages
CH 6
No ratings yet
CH 6
47 pages
Compare Between Grid and Cluster
100% (1)
Compare Between Grid and Cluster
12 pages
04 cmsc416 Mpi
No ratings yet
04 cmsc416 Mpi
31 pages
MPI4 Py
No ratings yet
MPI4 Py
28 pages
Intro MPI
No ratings yet
Intro MPI
60 pages
S7-LAN User Manual: (English)
No ratings yet
S7-LAN User Manual: (English)
251 pages
Openpipeflow 1.02b
No ratings yet
Openpipeflow 1.02b
22 pages
Cs-3006 6 Mpi Basics 2
No ratings yet
Cs-3006 6 Mpi Basics 2
52 pages
HPC Lecture40
No ratings yet
HPC Lecture40
25 pages
High Performance Computing: Matthew Jacob Indian Institute of Science
No ratings yet
High Performance Computing: Matthew Jacob Indian Institute of Science
25 pages
Message Passing Interface (MPI) Programming
No ratings yet
Message Passing Interface (MPI) Programming
11 pages
In3200 Chap09
No ratings yet
In3200 Chap09
56 pages
Parallel Programming With Message-Passing Interface (MPI)
No ratings yet
Parallel Programming With Message-Passing Interface (MPI)
6 pages
Introduction To C MPI PM
No ratings yet
Introduction To C MPI PM
50 pages
Lec 9 DR Marwa Abbas
No ratings yet
Lec 9 DR Marwa Abbas
64 pages
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
No ratings yet
Ms. V. Uma Maheswari, Assistant Lecturer, Department of Information Technology, National Institute of Technology, Surathkal
91 pages
10.collectives I
No ratings yet
10.collectives I
31 pages
Message Passing Interface (MPI) Programming
No ratings yet
Message Passing Interface (MPI) Programming
11 pages
MPI Part2 Updated
No ratings yet
MPI Part2 Updated
20 pages
CSC4005 Tutorial3
No ratings yet
CSC4005 Tutorial3
40 pages
1 MPI Communications: CS424. Parallel Computing Lab#4
No ratings yet
1 MPI Communications: CS424. Parallel Computing Lab#4
30 pages
2.0 Semantic Terms: Distributed Memory System
No ratings yet
2.0 Semantic Terms: Distributed Memory System
4 pages
Technical Data of CPU 315-2DP
No ratings yet
Technical Data of CPU 315-2DP
4 pages
Clustering Methods For Big Data Analytics Techniques, Toolboxes and Applications
No ratings yet
Clustering Methods For Big Data Analytics Techniques, Toolboxes and Applications
192 pages
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
No ratings yet
A Survey of Parallel Programming Models and Tools in The Multi and Many-Core Era
18 pages
Siemens 317t Gs Sinamics s120
No ratings yet
Siemens 317t Gs Sinamics s120
84 pages
MPI Point-To-Point Communication Modes
No ratings yet
MPI Point-To-Point Communication Modes
2 pages
Module 3 Solutions PCS Ia2 Q.banks
No ratings yet
Module 3 Solutions PCS Ia2 Q.banks
13 pages
Manual Siemens S7 400
No ratings yet
Manual Siemens S7 400
70 pages
Lec5 MPI
No ratings yet
Lec5 MPI
28 pages
Unit Iv Distributed Memory Programming With Mpi
No ratings yet
Unit Iv Distributed Memory Programming With Mpi
19 pages
PLC s7 200 Em277 Manual PDF
No ratings yet
PLC s7 200 Em277 Manual PDF
17 pages
Mpi Lecture
No ratings yet
Mpi Lecture
129 pages
Mpi Unit 5 Part 2 1
No ratings yet
Mpi Unit 5 Part 2 1
65 pages
Parallel & Distributed Computing: MPI - Message Passing Interface
No ratings yet
Parallel & Distributed Computing: MPI - Message Passing Interface
49 pages
Mpi Basic Operations
No ratings yet
Mpi Basic Operations
6 pages
Course Outline
No ratings yet
Course Outline
4 pages
How To Run MPI Under CodeBlocks
No ratings yet
How To Run MPI Under CodeBlocks
2 pages
Learning RabbitMQ with C#: A magical tool for the IT world
From Everand
Learning RabbitMQ with C#: A magical tool for the IT world
Saineshwar Bageri
No ratings yet
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
Kafka Developer Certified: The Essential Guide
From Everand
Kafka Developer Certified: The Essential Guide
SUJAN
No ratings yet

Lecture 12-MPI Collective Communication

Uploaded by

Lecture 12-MPI Collective Communication

Uploaded by

Applied High-Performance Computing and Parallel

The University of Hong Kong

Send Buffer Receive Buffer

▪ Possible solutions to avoid deadlock:

▪ Point-to-Point and Collective Communication with MPI

▪ MPI Nonblocking Point-to-Point Communication

• Objective: 1) Do nonblocking point-to-point

• Rank 0 initializes the Isend. The

• Work for 1ms, then check

• If the receiver is still not ready

Submit the job to the slurm

• Caveat: Compiler does not know about asynchronous modification of data

• MPI point-to-point communication +

• Clean shutdown by MPI Program shutdown

• It must involve all processes in the scope of a communicator

• It must involve all processes in the scope of a communicator

• Nonblocking variants: A nonblocking call return immediately to the

▪ Different types of communication:

#pragma omp barrier

• Each thread blocks upon reaching the barrier

▪ Explicit synchronization of all ranks from

▪ Any process calling it will be blocked until all the

▪ No restrictions on which rank is root

• buffer [send/recv] the address of the buffer.

▪ No restrictions on which rank is root (usually rank 0)

▪ No restrictions on which rank is root (usually rank 0)

• Defining variables: We will send four

• The MPI_Bcast() function broadcasts a

• All processes should print out their

MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype sendstype,

▪ In general, sendcount = recvcount

MPI_Scatter(sendbuf, 1, MPI_INT, recvbuf, 1, MPI_INT, root, MPI_COMM_WORLD)

MPI_Scatter(sendbuf, 1, MPI_INT, recvbuf, 1, MPI_INT, root, MPI_COMM_WORLD)

int MPI_Gather(void *sendbuf, int sendcount, MPI_Datatype sendtype,

• sendcount is the number of elements in the send

▪ In general, sendcount = recvcount gather

MPI_Gather(sendbuf, 1, MPI_INT, recvbuf, 1, MPI_INT, root, MPI_COMM_WORLD)

Fig. 1 Matrix-vector multiplication.

• A: a matrix partitioned across rows and distributed to

MPI_reduce( void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype,

rank 0 rank 1 rank 2 rank 3

MPI_reduce(sendbuf, recvbuf, 1, MPI_MPI_INT,

• Prerequisite knowledge of Trapezoidal Rule in C language for integration

MPI_Scan(sendbuf, recvbuf, 4, MPI_MPI_INT,

MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype sendstype,

MPI_Iscatter(void *sendbuf, int sendcount, MPI_Datatype sendstype,

What happened with this

• We have induced some chaotic

• Every nonblocking call in MPI

• Data movement (broadcast, scatter,

Give us your feedback!

You might also like

MPI_reduce( void sendbuf, void recvbuf, int count, MPI_Datatype datatype,