[go: up one dir, main page]

0% found this document useful (0 votes)
12 views18 pages

OpenMP 4

The document outlines a course on parallel programming, covering both theoretical and practical aspects, including parallel computer architectures, memory models, and programming techniques using OpenMP, MPI, and CUDA. It details the structure of OpenMP programming, including directives, clauses, and scheduling methods such as static, dynamic, guided, and runtime. Additionally, it provides references for further reading and acknowledges sources of information related to OpenMP and parallel programming.

Uploaded by

Ranny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views18 pages

OpenMP 4

The document outlines a course on parallel programming, covering both theoretical and practical aspects, including parallel computer architectures, memory models, and programming techniques using OpenMP, MPI, and CUDA. It details the structure of OpenMP programming, including directives, clauses, and scheduling methods such as static, dynamic, guided, and runtime. Additionally, it provides references for further reading and acknowledges sources of information related to OpenMP and parallel programming.

Uploaded by

Ranny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Index

• OpenMP
• Directives : if, for
• Clauses
• Schedule
• Static

• Dynamic

• Guided

• Runtime

• References
Course Outline

Course Plan: Theory:


Part A: Parallel Computer Architectures
Week 1,2,3: Introduction to Parallel Computer Architecture: Parallel Computing,
Parallel architecture, bit level, instruction level , data level and task level
parallelism. Instruction level parelllelism: pipelining(Data and control instructions),
scalar and superscalar processors, vector processors. Parallel computers and
computation.
Week 4,5: Memory Models: UMA, NUMA and COMA. Flynns classification, Cache
coherence,
Week 6,7: Amdahl’s Law. Performance evaluation, Designing parallel algorithms :
Divide and conquer, Load balancing, Pipelining.
Week 8 -11: Parallel Programming techniques like Task Parallelism using TBB,
TL2, Cilk++ etc. and software transactional memory techniques.
Course Outline
Part B: OpenMP/MPI/CUDA
Week 1,2,3 : Shared Memory Programing Techniques: Introduction to OpenMP :
Directives: parallel, for, sections, task, master, single, critical, barrier, taskwait,
atomic. Clauses: private, shared, firstprivate, lastprivate, reduction, nowait, ordered,
schedule, collapse, num_threads, if().
Week 4,5: Distributed Memory programming Techniques: MPI: Blocking, Non-
blocking.
Week 6,7 : CUDA : OpenCL, Execution models, GPU memory, GPU libraries.
Week 10,11,: Introduction to accelerator programming using CUDA/OpenCL and
Xeon-phi. Concepts of Heterogeneous programming techniques.
Practical:
Implementation of parallel programs using OpenMP/MPI/CUDA.
Assignment: Performance evaluation of parallel algorithms (in group of 2 or 3
members)
1. OpenMP
FORK – JOIN Parallelism
• OpenMP program begin as a single process: the master thread. The master
thread executes sequentially until the first parallel region construct is
encountered.
• When a parallel region is encountered, master thread
– Create a group of threads by FORK.
– Becomes the master of this group of threads and is assigned the thread id 0 within the group.
• The statement in the program that are enclosed by the parallel region
construct are then executed in parallel among these threads.
• JOIN: When the threads complete executing the statement in the parallel region
construct, they synchronize and terminate, leaving only the master thread.
2. OpenMP Programming: Directives : Parallel, For
#pragma omp for [clause[,]clause...] new-line
#pragma omp parallel [clause[,]clause...] new-line
for-loops
Structured-block
Clause: private(list)
Clause: if( scalar-expression)
firstprivate(list)
num_threads(integer-expression)
lastprivate(list)
default(shared|none)
reduction(operator:list)
private(list)
schedule(kind[,chunk_size])
firstprivate(list)
collapse(n)
shared(list)
ordered
copyin(list)
nowait
reduction(operator:list)
2. OpenMP Programming: Clauses : Schedule

#pragma omp for [clause[,]clause...] new-line Schedule(kind[,chunksize]) Clause


for-loops • Schedule clause specifies how iteration of the loop
Clause: private(list) are divided into contiguous non-empty subsets,
called chunks, and how these chunks are
firstprivate(list) assigned among threads of the team.
lastprivate(list) • Kind: It has following kind.
reduction(operator:list) • Static
• Dynamic
schedule(kind[,chunk_size]) • Guided
collapse(n) • runtime

ordered
nowait
2. OpenMP Programming: schedule(static, chunk_size)

#pragma omp for [clause[,]clause...] new-line Schedule(static, chunksize]) Clause


for-loops • Iterations are divided into chunk of size
chunk_size.
Clause: private(list)
• Chunks are statically assigned to threads in
firstprivate(list)
round robin fashion in the order of thread
lastprivate(list) number
reduction(operator:list) • Last chunk to be assigned may have smaller
schedule(kind[,chunk_size]) number of iterations.

collapse(n) • When no chunk size is specified,


iterations/threads
ordered
• Example: 28 iteration, threads= 4
nowait
• Schedule(static, 5)

thread0 thread1 thread2 Thread3 thread0 thread1


0-4 5-9 10-14 15-19 20-24 25-27
2. OpenMP Programming: schedule(dynamic, chunk_size)

#pragma omp for [clause[,]clause...] new-line Schedule(Dynamic, chunksize]) Clause


for-loops • Iterations are assigned to threads in chunksize
Clause: private(list) as the threads request them.
firstprivate(list) • Thread executes the chunk of iteration and
lastprivate(list) then requests another chunk, until
all iterations are complete.
reduction(operator:list)
• Each chunk contains chunksize except for the
schedule(kind[,chunk_size]) last chunk assigned.
collapse(n)
• Example: 28 iteration, threads= 4
ordered
• Schedule(dynamic, 5)
nowait

thread1 thread3 thread0 thread2 thread1 thread2


0-4 5-9 10-14 15-19 20-24 25-27
2. OpenMP Programming: schedule(guided, chunk_size)
Schedule(Guided, chunksize]) Clause • Chunk = remaining iterations / #threads

• Iterations are assigned to threads of chunksize • Example: 28 iteration, threads= 4


as the threads request them. • Schedule(guided,3)
• 28/4 = 7 [remaining = 28-7=21]
• Thread executes the chunk of iteration and
then requests another chunk, until • 21/4=5.2 => 6 [remaining =21-6 = 15]
all iterations are complete. • 15/4=3.7=>4 [remaining = 15-4 = 11]

• Chunk = remaining iterations / #threads • 11/4=2.7 =>3 [ remaining =11-3=8]


• 8/4 = 2 [ min is chunk size 3 . So assign 3:]
• Chunk size determines the minimum size
of chunk , except lst chunk. • [remaining =8-3 = 5]
• 5/4 = 1 [ min = 3: remaining : 2]
• Default value of chunk_size =1
• 2<=3 , so last chunk = 2

thread2 thread1 thread0 thread3 thread2 thread2 thread2


0-6 (7) 7-12(6) 13-16(4) 17-19(3) 20-22 (3) 23-25(3) 26-27(2)
2. OpenMP Programming: schedule(runtime)

#pragma omp for [clause[,]clause...] new-line Schedule(runtime) Clause


for-loops • The decision regarding scheduling is defered
Clause: private(list) until run time, and the schedule and chunk
size are taken from the run-sched-var control
firstprivate(list)
variable.
lastprivate(list)
reduction(operator:list)
schedule(kind[,chunk_size])
collapse(n)
ordered
nowait
2. OpenMP Programming: schedule(static, chunk_size)

thread0 thread1 thread2 Thread3 thread0 thread1


0-4 5-9 10-14 15-19 20-24 25-27
2. OpenMP Programming: schedule(dynamic, chunk_size)

thread2 thread1 thread0 thread3 thread1 thread3


0-4 5-9 10-14 15-19 20-24 25-27
2. OpenMP Programming: schedule(guided, chunk_size)

thread2 thread1 thread0 thread3 thread2 thread2 thread2


0-6 (7) 7-12(6) 13-16(4) 17-19(3) 20-22 (3) 23-25(3) 26-27(2)
2. OpenMP Programming: schedule(runtime)

thread1 thread0 thread3 Thread2 thread0


0 2 3 4-6 7-27
2. OpenMP Programming: schedule(runtime)
Index

• OpenMP
• Directives : if, for
• Clauses
• Schedule
• Static

• Dynamic

• Guided

• Runtime

• References
Reference
Text Books and/or Reference Books:
1. Professional CUDA C Programming – John Cheng, Max Grossman, Ty McKercher, 2014
2. B.Wilkinson, M.Allen, ”Parallel Programming: Techniques and Applications Using Networked
Workstations and Parallel Computers”, Pearson Education, 1999
3. I.Foster, ”Designing and building parallel programs”, 2003
4. Parallel Programming in C using OpenMP and MPI – Micheal J Quinn, 2004
5. Introduction to Parallel Programming – Peter S Pacheco, Morgan Kaufmann Publishers,
2011
6. Advanced Computer Architectures: A design approach, Dezso Sima, Terence Fountain, Peter
Kacsuk, 2002
7. Parallel Computer Architecture : A hardware/Software Approach, David E Culler, Jaswinder
Pal Singh Anoop Gupta, 2011 8. Introduction to Parallel Computing, Ananth Grama, Anshul
Gupta, George Karypis, Vipin Kumar, Pearson, 2011
Reference
Acknowledgements
1. Introduction to OpenMP https://www3.nd.edu/~zxu2/acms60212-40212/Lec-12-OpenMP.pdf
2. Introduction to parallel programming for shared memory
Machines https://www.youtube.com/watch?v=LL3TAHpxOig
3. OpenMP Application Program Interface Version 2.5 May 2005
4. OpenMP Application Program Interface Version 5.0 November 2018

You might also like