[go: up one dir, main page]

0% found this document useful (0 votes)
18 views8 pages

ACA 2024W 04 Shared-memory programming with OpenMP 1-15

Uploaded by

Ghofrane Rh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views8 pages

ACA 2024W 04 Shared-memory programming with OpenMP 1-15

Uploaded by

Ghofrane Rh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Multicore & GPU Programming : An Integrated Approach

Shared-Memory Programming: OpenMP


By G. Barlas

 Modifications by H. Weber

Objectives
!
Learn how to use OpenMP compiler directives to introduce concurrency in a
sequential program.
!
Learn the most important OpenMP #pragma directives and associated
clauses, for controlling the concurrent constructs generated by the compiler.
!
Understand which loops can be parallelized with OpenMP directives.
!
Address the dependency issues that OpenMP-generated threads face,
using synchronization constructs.
!
Learn how to use OpenMP to create function-parallel programs.
!
Learn how to write thread-safe functions.
!
Understand the issue of cache-false sharing and learn how to eliminate it.

<C> G. Barlas, 2015 2


Introduction

!
The decomposition of a sequential program into
components that can execute in parallel is a tedious
enterprise.
!
OpenMP has been designed to alleviate much of the effort
involved, by accommodating the incremental conversion of
sequential programs into parallel ones, with the assistance
of the compiler.
!
OpenMP relies on compiler directives for decorating
portions of the code that the compiler will attempt to
parallelize.

<C> G. Barlas, 2015 3

OpenMP History
! OpenMP: Open Multi-Processing is an API for shared-memory programming.
! OpenMP was specifically designed for parallelizing existing sequential
programs.
! Uses compiler directives and a library of functions to support its operation.
! OpenMP v.1 was published in 1998.
! OpenMP v.4.0 was published in 2013.
! Standard controlled by the OpenMP Architecture Review Board (ARB).
! GNU C support:
− GCC 4.7 supports OpenMP 3.1 specification
− GCC 4.9 supports OpenMP 4.0.

<C> G. Barlas, 2015  4


OpenMP Paradigm

! OpenMP programs are Globally Sequential, Locally


Parallel.
! Programs follow the fork-join paradigm:

<C> G. Barlas, 2015 5

OpenMP Essential Definitions


! Structured block: an executable statement or a compound block,
with a single point of entry and a single point of exit.
! Construct: an OpenMP directive and the associated statement, for-
loop or structured block that it controls.
!
Region: all code encountered during the execution of a construct,
including any called functions.
!
Parallel region: a region executed simultaneously by multiple
threads.
! A region is dynamic but a construct is static.
!
Master thread: the thread executing the sequential part of the
program and spawning the child threads.
!
Thread team: a set of threads that execute a parallel region.

<C> G. Barlas, 2015 6


„Hello World“ in OpenMP

! Can you match some of the previous definitions with parts of this
program?
<C> G. Barlas, 2015 7

„Hello World“ Sequence Diagram


! One of the possible execution sequences:

<C> G. Barlas, 2015 8


#pragma directives

!
Pragma directives allow a programmer to access compiler-
specific preprocessor extensions.
!
For example, a common use of pragmas, is in the management
of include files. E.g.
#pragma once
!
Pragma directives in OpenMP can have a number of optional
clauses, that modify their behavior.
!
In the previous example the clause is num_threads(numThr)
!
Compilers that do not support certain pragma directives, ignore
them.

<C> G. Barlas, 2015 9

Thread Team Size Control

!
Universally: via the OMP_NUM_THREADS environmental
variable:
$ echo ${OMP_NUM_THREADS} # to query the value
$ export OMP_NUM_THREADS=4 # to set it in BASH
!
Program level: via the omp_set_number_threads function,
outside an OpenMP construct.
!
Pragma level: via the num_threads clause.
!
The omp_get_num_threads call returns the active threads in a
parallel region. If it is called in a sequential part it returns 1.

<C> G. Barlas, 2015 10


Variable Scope
! Outside the parallel regions, normal scope rules apply.
!
OpenMP specifies the following types of variables:
− Shared: all variables declared outside a parallel region are by default
shared. That does not mean that they are in anyway "protected".
− Private: all variables declared inside a parallel region are allocated in the
run-time stack of each thread. So we have as many copies of these
variables as the size of the thread team. Private variables are destroyed
upon the termination of a parallel region.
− Reduction: a reduction variable gets individual copies for each thread
running the corresponding parallel region. Upon the termination of the
parallel region, an operation is applied to the individual copies (e.g.
summation) to produce the value that will be stored in the shared variable.
!
The default scope of variables can be modified by clauses in the
pragma lines.
<C> G. Barlas, 2015 11

Parallel Function Integration

end

∫ f (x )dx
start
n−1
f (x i ) + f (x i+ 1 )
≈ ∑ step⋅ 2
i=0
f (start ) + f (end) n−1
= step ⋅( + ∑ f (x i ))
2 i=1
where x 0 = start
x n = end
step = (end−start )/n

<C> G. Barlas, 2015 12


Example: Function integrate()
!
The sequential implementation:
double integrate (double st, double en, int div, double (*f) (double))
{
double localRes = 0;
double step = (en - st) / div;
double x;
x = st; Look at Code integrate_seq.cpp.
localRes = f (st) + f (en);
localRes /= 2;
for (int i = 1; i < div; i++)
{
x += step;
localRes += f (x);
}
localRes *= step;

return localRes;
}
//---------------------------------------
int main (int argc, char *argv[])
{
. . .
double finalRes = integrate (start, end, divisions, testf);

cout << finalRes << endl;


<C> G. Barlas, 2015 13

OpenMP V.0: Manual partitioning


! Given the ID of each thread, we can calculate:

Race <C>
condition!
G. Barlas, 2015 14
OpenMP V.1: Removing the race cond.
! Giving each thread its own private storage. Sequential
reduction is required afterwards.

<C> G. Barlas, 2015 15

You might also like