OpenMP
Arash Bakhtiari
bakhtiar@in.tum.de
2012-12-18 Tue
Introduction
I
Chip manufacturers are rapidly moving to multi-core
CPUs
Figure : Quad-core processor Intel Sandy Bridge
Shared Memory Model
I
All processors can access all memory in global address
space.
Threads Model: A single process can have multiple,
concurrent execution paths
On a multi-core system, the threads run at the same
time, with each core running a particular thread or task.
Figure : Shared Memory Model [1]
What is OpenMP?
I
I
I
I
An Application Program Interface (API)
Used to explicitly direct multi-threaded, shared memory
parallelism
Provides a portable, scalable model
Supports C/C++ and Fortran on a wide variety of
architectures
Fork-Join Model
I
I
OpenMP-program starts as a single thread
Additional threads (Team) are created when the master
hits a parallel region
When all threads finished the parallel region, the new
threads are given back to the runtime or operating
system.
The master continues after the parallel region
Fork-Join Model (cont.)
Figure : Fork-Join Model [1]
OpenMP API
Primary API components:
I Compiler Directives:
#pragma omp p a r a l l e l
Run-time Library Routines:
i n t omp_get_num_threads ( v o i d ) ;
Environment Variables
e x p o r t OMP_NUM_THREADS=2
Example
Listing 1: OpenMP Hello World!
#i n c l u d e <i o s t r e a m >
#i n c l u d e <omp . h>
int
{
main ( i n t
argc ,
char argv [ ] )
#pragma omp p a r a l l e l
{
s t d : : c o u t << "THREAD : " << omp_get_thread_num ( ) << " \ t H e l l o , World ! \ n " ;
}
return 0;
}
Listing 2: Compiling
g++ o h e l l o
h e l l o . c fopenmp
Classification of Variables
private(var-list):
I
shared(var-list):
I
Variables in var-list are private
Variables in var-list are shared.
default(private | shared | none):
I
Sets the default for all variables in this region.
Example
Listing 3: OpenMP Private Variable
#i n c l u d e <i o s t r e a m >
#i n c l u d e <omp . h>
i n t main ( i n t a r g c , c h a r a r g v [ ] )
{
int i , j ;
i = 1;
j = 2;
s t d : : c o u t << "BEFORE : i , j= "<< i << " , " << j << s t d : : e n d l ;
#pragma omp p a r a l l e l p r i v a t e ( i )
{
i = 3;
j = 5;
s t d : : c o u t << " INLOOP : i , j= "<< i << " , " << j << s t d : : e n d l ;
}
s t d : : c o u t << "AFTER :
return 0;
i , j= "<< i << " , " << j << s t d : : e n d l ;
Work-Sharing Constructs
Work-sharing constructs distribute the specified work to
all threads within the current team
Types:
I
I
I
I
Parallel loop
Parallel section
Master region
Single region
Parallel Loop
Syntax:
#pragma omp f o r
I
I
[ clause
...]
The iterations of the loop are distributed to the threads
The scheduling of loop iterations: static, dynamic,
guided, and runtime.
Scheduling Strategies
I
Schedule clause:
schedule ( type
[ , size ])
static: Chunks of the specified size are assigned in a
round- robin fashion to the threads.
dynamic: The iterations are broken into chunks of the
specified size. When a thread finishes the execution of a
chunk, the next chunk is assigned to that thread.
guided: Similar to dynamic, but the size of the chunks is
exponentially decreasing. The size parameter specifies the
smallest chunk. The initial chunk is implementation
dependent.
runtime: The scheduling type and the chunk size is
determined via environment variables.
Example
Listing 4: OpenMP Private Variable
#i n c l u d e <i o s t r e a m >
#i n c l u d e <omp . h>
#d e f i n e CHUNKSIZE 100
#d e f i n e N
1000
i n t main ( )
{
i n t i , chunk ;
d o u b l e a [N] , b [N] , c [N ] ;
s r a n d ( t i m e ( NULL ) ) ;
f o r ( i =0; i < N ; i ++) {
a [ i ] = generate_random_double ( 0 . 0 ,
b [ i ] = generate_random_double ( 0 . 0 ,
}
c h u n k = CHUNKSIZE ;
#pragma omp p a r a l l e l
{
10.0);
10.0);
s h a r e d ( a , b , c , chunk )
private ( i )
#pragma omp f o r s c h e d u l e ( dynamic , c h u n k ) n o w a i t
f o r ( i =0; i < N ; i ++)
c[ i ] = a[ i ] + b[ i ];
}
return
}
0;
References
Blaise Barney, Lawrence Livermore National Laboratory,
https://computing.llnl.gov/tutorials/openMP/