Map55611 1 2
Map55611 1 2
Instructions to candidates:
You may not start this examination until you are instructed to do so by the Invigi-
lator.
Page 1 of 6
MAP55611-1
1. (a) A computational task comprises three parts. The first step initialises data and takes
10 seconds to complete on a single core. The second part evaluates a function
on the data and takes 200 seconds on one core. The final part performs post-
processings and takes 10 seconds to complete on a single core. Only the algorithm
for the second part of the problem can be parallelised.
i. State Amdahl’s law and explain how it applies to estimating the value of
executing this composite task on more than one compute core. [3]
ii. Using Amdahl’s law, estimate how much faster this task can be executed on
a system with 8 cores and another system with a very large number of cores.
[4]
iii. Estimate how many cores would be needed to achieve at least 95% of the
maximum possible parallel performance for this task. [3]
(b) Explain which one of these two operations on arrays of length n can be parallelised
efficiently and write a function in OpenMP to perform the case where a simple
parallel implementation is possible
p p
gk = fk + gk−1 or gk = fk + fk−1 for k = 1 . . . n − 1
[8]
(c) Execution times for six runs of a parallel code performed with two different numbers
of compute cores, n and three problem sizes, p are given in the Table below.
Comment on the strong and weak scaling behaviour of this software. [7]
Table 1: Run times for different problem sizes, p and numbers of compute cores, n.
Page 2 of 6
© TRINITY COLLEGE DUBLIN, THE UNIVERSITY OF DUBLIN 2022
MAP55611-1
2. (a) Describe the output of the following OpenMP code fragment: [7]
(b) A function to sum the elements in an array is written to use the OpenMP library.
Describe what is likely to fail with this code, and write a corrected version.
[6]
for (i=0;i<50;i++)
do_task(i);
For each example given below, where a prediction can be made, state which thread
will execute the function for each value of the argument i = 0, 1, . . . 49 when each
OpenMP directive is used to parallelise this loop. Explain your answer.
Page 3 of 6
© TRINITY COLLEGE DUBLIN, THE UNIVERSITY OF DUBLIN 2022
MAP55611-1
(a) Explain why the ordering of site visits matters when considering if this algorithm
can be parallelised. [5]
that uses the OpenMP library to over-write the array phi with the next iteration of
Gauss-Seidel. The function should return the modulus of the difference between
phi between the two iterations. [20]
4. (a) For a matched pair of MPI Send() and MPI Recv() on two processes in the same
MPI communicator what information is the send and receive matched on? [3]
(b) Describe how using MPI Send and MPI Recv can lead to deadlock between pro-
cesses. Write a C snippet showing deadlock. Assume exactly two MPI processes.
[7]
(c) Using the same assumptions as above, write a C snippet showing a version of this
code that will not deadlock under any circumstances. [5]
(d) Explain the operation of MPI Gather. Write a function my gather() implement-
ing the effect of MPI Gather using at least some of the basic MPI functions
MPI Comm rank, MPI Comm size, MPI Send, MPI Recv and MPI Barrier. The
function my gather()should take the same arguments as MPI Gather [8]
(e) What is the difference between MPI Gather and MPI Allgather [2]
Page 4 of 6
© TRINITY COLLEGE DUBLIN, THE UNIVERSITY OF DUBLIN 2022
MAP55611-1
Page 5 of 6
© TRINITY COLLEGE DUBLIN, THE UNIVERSITY OF DUBLIN 2022
MAP55611-1
Page 6 of 6
© TRINITY COLLEGE DUBLIN, THE UNIVERSITY OF DUBLIN 2022