[go: up one dir, main page]

0% found this document useful (0 votes)
54 views32 pages

High Performance Computing (HPC) Lec4

Uploaded by

omargamalelziky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views32 pages

High Performance Computing (HPC) Lec4

Uploaded by

omargamalelziky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

High Performance Computing

(HPC)
Lecture 4

By: Dr. Maha Dessokey


Programming with Shared Memory
(OpenMP Contd.)
Working with loops

Basic approach
 Find compute intensive loops
 Make the loop iterations independent ..
So they can safely execute in any order
without loop-carried dependencies
j+=2;
j={5+2,7+2,9+2,…….)-→ j=5+2*(i+1)
Where i is the thread id (0,1,2,….)
Nowait

 In all cases, there is an implicit barrier at


the end of the construct unless a nowait
clause is included.
Nowait
SPMD vs. worksharing

 A parallel construct by itself creates an SPMD or “Single Program


Multiple Data” program … i.e., each thread redundantly executes
the same code.
 How do you split up pathways through the code between threads
within a team?
 This is called worksharing
 Loop construct
 Sections/section constructs
 Single construct Out of our scope
 Task construct
Sections worksharing Construct

 Sections worksharing Construct


 The Sections worksharing
construct gives a different
structured block to each
thread.
 By default, there is a barrier at
the end of the “omp sections”.
Use the “nowait” clause to turn
off the barrier.
Single worksharing Construct

 The single construct denotes a


block of code that is executed
by only one thread (not
necessarily the master thread).
Master Directive

 The master directive:


#pragma omp master
structured_block
 causes the master thread to execute the structured block.
 Different to those in the work sharing group in that there is no
implied barrier at the end of the construct (nor the beginning).
 Other threads encountering this directive will ignore it and the
associated structured block, and will move on.
Runtime Library routines

Modify/Check the number of threads


omp_set_num_threads(),
omp_get_num_threads(),
omp_get_thread_num(),
omp_get_max_threads()
Runtime Library routines

Are we in an active parallel region?


 omp_in_parallel()
Do you want the system to dynamically vary the number
of threads from one parallel construct to another?
omp_set_dynamic, omp_get_dynamic();
How many processors in the system?
omp_num_procs()
Data environment
Default storage attributes

 Shared Memory programming model:


 Most variables are shared by default
 Global variables are SHARED among threads
 But not everything is shared...
 When you declare a parallel region or a loop using OpenMP, any automatic variables declared within that
region are automatically treated as private to each thread.
 This prevents data races, as each thread will have its own copy of the variable, allowing them to operate
independently without interfering with each other's data.
Data sharing: Examples

Double A[10];
int main()
A, index, count
{
int index[10];
#pragma omp parallel temp temp temp

work(index);
printf(“%d\n”, index[0]);
A, index, count
}
void work(int *index)
{
A, index and count are shared
double temp[10]; by all threads.
static int count; temp is local to each thread
...
Changing storage attributes

 One can selectively change storage attributes constructs using the following
clauses*
 SHARED
 PRIVATE
 FIRSTPRIVATE
 THREADPRIVATE
 The value of a private inside a parallel loop can be transmitted to a global value
outside the loop with:
 LASTPRIVATE
 The default status can be modified with:
 DEFAULT (PRIVATE | SHARED | NONE)
Private Clause

 private(var) creates a new local


copy of var for each thread.
 The value of the private copies is
uninitialized
 The value of the original
variable is unchanged after the
region
Firstprivate Clause

Variables initialized from shared


variable
Lastprivate Clause

 Variables update shared


variable using value from last
iteration
 The thread that will execute
last iteration will update the
shared variable X
Data Environment Example

 Consider this example of PRIVATE and


FIRSTPRIVATE  Inside this parallel region
 “A” is shared by all threads; equals 1
variables: A = 1,B = 1, C = 1
#pragma omp parallel private(B) firstprivate(C)  “B” and “C” are local to each thread.
 B’s initial value is undefined
 Are A,B,C local to each thread  C’s initial value equals 1
or shared inside the parallel
region?
 What are their initial values  Following the parallel region
inside and values after the  B and C revert to their original values of 1
parallel region?  A is either 1 or the value it was set to inside
the parallel region
Private Vs. Firstprivate Vs. Last Private

 Private, firstprivate and lastprivate clauses · OpenMP Little Book


Reference

openmp.org
https://hpc-tutorials.llnl.gov/openmp/
openmp-examples-4.5.0.pdf
The basic parallel construct · OpenMP Little Book
How to build a cluster
HPC Cluster
HPC cluster components

 Head or Login Node: This node validates users and may set up
specific software on the compute nodes.
 Compute Nodes: These perform numerical computations. Their
persistent storage may be minimal, while the DRAM memory will be
high.
 Accelerator Nodes: Some nodes may include one or more
accelerators, while smaller HPC clusters, purpose built for a specific
use may be set up where all nodes contain an accelerator.
Example GPUs.
HPC cluster components

 Storage Nodes or Storage System: An efficient HPC cluster must


contain a high performance, parallel file system (PFS). A PFS allows
all nodes to communicate in parallel to the storage drives. HPC
storage allows for the compute nodes to operate with minimal wait
times.
 Network Fabric: In HPC clusters, typically low latency and high
bandwidth are required.
 Software: HPC cluster computing requires underlying software to
execute applications and control underlying infrastructure.
Software is essential to the efficient management of the massive
amounts of I/O that are inherent to HPC applications.
HPC user environment

 Operation system: Linux (Redhat/CentOS, Ubuntu, etc), Unix.


 Login: ssh (:22)
 File transfer: secure ftp (scp), grid ftp (globus).
 Job scheduler: Slurm, PBS, SGE
 Software management: module.
 MPI implementations: OpenMPI, MPICH, MVAPICH, Intel MPI.
 Debugging and profiling tools: Totalview, Tau, DDT, Vtune.
 Programming Languages: C, C++, Fortran, Python, Perl, R, MATLAB, Julia
Cluster Management Tools

A cluster management tool is really a toolkit to automate the


configuration, launching, and management of compute nodes from
the master node (or a node designated a master). In some cases, the
toolkit will even install the master node for you. A number of open
source cluster management tools are available
 Warewulf
 xCAT
 openhpc
 TrinityX
 qlustar
Cluster Job Scheduling Tools

 A job scheduler is a computer program that enables an enterprise to


schedule and, in some cases, monitor computer "batch" jobs (units of
work).
 A job scheduler controls the execution of various jobs or background
processes, also known as batch scheduling
 To determine which job to run, schedulers might consider parameters
that include the following:
 Job priority.
 Job timing.
 Availability of computing resources.
 Number of parallel jobs permitted for a user.
What are the features of job schedulers?

 Job priority
 Computer resource availability
 License key if job is using licensed software
 Execution time allocated to user
 Number of simultaneous jobs allowed for a user
 Estimated execution time
 Availability of peripheral devices
 Operator prompt dependency. Some schedulers provide sophisticated
features, such as real-time scheduling in accordance with external
events, automatic restart in case of failures and automated incident
reporting.
Cluster Monitoring tools

 Ganglia is described as 'Scalable distributed monitoring system for


high-performance computing systems such as clusters and Grids.
 Zabbix, which is both free and Open Source.
 Netdata
 Datadog
 Munin
 Netler.
Assignment 1

 Assignment 1 will be postponed after taking the lab


Assignment2

 Choose one of the cluster management ,job scheduler and monitoring tools.
 Register your choice with Eng. Basant this Section
 A full report about this tool will be delivered with 5 slides presentation
 Duedate: 2 November 2024

You might also like