Scientific Computing
V. School of Mathematics and Science
Introduction to High-Performance Computing
@ University Oldenburg
The new HPC cluster: hardware and usage
Stefan Harfst (Scientific Computing)
Scientific Computing
V. School of Mathematics and Science
Contents
• New HPC Cluster
Overview
• User Environment
– File Systems
– Modules
• SLURM Job Submission
– GPUs
• Migration from old System
– Matlab
23.02.2017 Introduction to HPC @ UOL 2
Scientific Computing
V. School of Mathematics and Science
NEW HPC CLUSTER
OVERVIEW
23.02.2017 Introduction to HPC @ UOL 3
Scientific Computing
V. School of Mathematics and Science
• the new hardware was Delivery Day
delivered Aug 22nd
http://www.uni-oldenburg.de/fk5/wr/aktuelles/artikel/art/neue-hochleistungsrechner-fuer-die-universitaet-oldenburg-2380/
23.02.2017 Introduction to HPC @ UOL 4
Scientific Computing
V. School of Mathematics and Science
Overview New Hardware
• CARL
– multi-purpose cluster as a basic computing resource
– funded by the University/MWK and the DFG under grant number
INST 184/157-1 FUGG (Forschungsgroßgerät nach Art. 91b GG)
• EDDY
– CFD cluster for wind energy research
– funded by the BMWi under grant number 0324005
• used as a shared HPC cluster
– common infrastructure is shared (e.g. file systems, network)
– shared administration
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Acknowledging_the_HPC_facilities_2016
23.02.2017 Introduction to HPC @ UOL 5
Scientific Computing
V. School of Mathematics and Science
HPC Facilities @ University Oldenburg
• shared HPC cluster CARL/EDDY
– close to 600 compute nodes
– 4 login and 2 administration nodes
– Infiniband FDR interconnect for parallel computing
– 10/1GE network
– parallel file system (GPFS) with 900TB capacity
– NFS mounted central storage
– Linux (RHEL) as OS
– many scientific applications and libraries available
– Job Scheduler (SLURM)
https://wiki.hpcuser.uni-oldenburg.de/index.php?title=HPC_Facilities_of_the_University_of_Oldenburg_2016
23.02.2017 Introduction to HPC @ UOL 6
Scientific Computing
V. School of Mathematics and Science
Schematic View of HPC Cluster
23.02.2017 Introduction to HPC @ UOL 7
Scientific Computing
V. School of Mathematics and Science
Schematic View GE Network
23.02.2017 Introduction to HPC @ UOL 8
Scientific Computing
V. School of Mathematics and Science
Summary CARL & EDDY
Feature CARL EDDY Total
Nodes 327 244 571
Cores 7.640 5.856 13.496
RAM 77 TB 21 TB 98 TB
GPFS 450 TB 450 TB 900 TB
local disks 360 TB - 360 TB
Rpeak (nominal) 271 Tflop/s 201 Tflop/s 482 Tflop/s
Rpeak (AVX2) 221 Tflop/s 164 Tflop/s 385 Tflop/s
Rmax 457.2 Tflop/s
Rank 363 in Top500 https://www.top500.org/system/178942
23.02.2017 Introduction to HPC @ UOL 9
Scientific Computing
V. School of Mathematics and Science
23.02.2017 Introduction to HPC @ UOL 10
Scientific Computing
V. School of Mathematics and Science
Top500 Performance Development
CARL/EDDY
• Rmax 457.2 Tflop/s
• Rank 363
• 571 nodes
• 13,500 cores
• 100 TB RAM
17x Rmax
1.6x nodes
3.3x cores
10x RAM
HERO/FLOW
• Rmax 27.2 Tflop/s
• not ranked
• 350 nodes
• 4,000 cores
• 10 TB RAM
23.02.2017 Introduction to HPC @ UOL 11
Scientific Computing
V. School of Mathematics and Science
Compute Nodes CARL
• 128/158x MPC-LOM/STD
– multiple nodes per chassis
– 2x Intel 2650 V4
– 12 cores @ 2,2 GHz
– 128/256 GB RAM (8x16/32)
– 1 TB HDD
• 9x MPC-GPU
– as MPC-STD
– NVIDIA Tesla GPU
23.02.2017 Introduction to HPC @ UOL 12
Scientific Computing
V. School of Mathematics and Science
Compute Nodes CARL
• 30x MPC-BIG
– 2x Intel 2667 V4
– 8 cores @ 3,2 GHz
– 512 GB RAM (16x32)
• 2x MPC-PP
– 4x Intel 8891 V4
– 10 cores @ 2,8 GHz
– 2048 GB RAM (64x32)
• both with
– 1x Intel P3700 2.0TB NVMe Flash Adapter
with up to 2.8/2.0 GB/s R/W performance
– free PCIe slots for future expansion
23.02.2017 Introduction to HPC @ UOL 13
Scientific Computing
V. School of Mathematics and Science
Compute Nodes EDDY
• 160/81x CFD-LOM/HIM
– multiple nodes per chassis
– 2x Intel 2650 V4
– 12 cores @ 2,2 GHz
– 64/128 GB RAM (8x8/16)
• 3x CFD-GPU
– as MPC-STD
– NVIDIA Tesla GPU
23.02.2017 Introduction to HPC @ UOL 14
Scientific Computing
V. School of Mathematics and Science
Interconnect
• FDR Infiniband
– 54 Gbit/s data transfer (throughput)
– latency 0.7ms
– message rate 137 Million msg/s
– switched fabric topology (fat tree)
– CARL blocking factor 8:1
– EDDY fully non-blocking
23.02.2017 Introduction to HPC @ UOL 15
Scientific Computing
V. School of Mathematics and Science
Parallel Filesystem
• GPFS Storage Server
– Dual Server System
– GPFS Nativ Raid Technologie
– RAID 8+2P
– 232 HDDs, 900TB net capacity
– Fast Raid Rebuild
• < 1 Hour
• Shared Storage for both Clusters
– min. 12000 MB/s write performance
– min. 17500 MB/s read performance
23.02.2017 Introduction to HPC @ UOL 16
Scientific Computing
V. School of Mathematics and Science
Racks in Server Room
23.02.2017 Introduction to HPC @ UOL 17
Scientific Computing
V. School of Mathematics and Science
USER ENVIRONMENT
23.02.2017 Introduction to HPC @ UOL 18
Scientific Computing
V. School of Mathematics and Science
Login to the HPC Cluster
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=Login
• Linux
– use ssh as before with carl or eddy as login nodes
ssh -X abcd1234@carl.hpc.uni-oldenburg.de
• Windows
– use MobaXterm (recommended) or PuTTY
• login host names
hpcl00[1-4].hpc.uni-oldenburg.de
– can be used instead of carl or eddy (for login to specific node)
– no difference between carl and eddy as login
• from outside of the campus network use VPN connection
– see instructions at http://www.itdienste.uni-oldenburg.de/21240.html
23.02.2017 Introduction to HPC @ UOL 19
Scientific Computing
V. School of Mathematics and Science
File Systems
http://wiki.hpcuser.uni-oldenburg.de/index.php?title=File_system_and_Data_Management
• central ISILON storage
– used for home directories (as before)
– NFS mounted over 10Gb Ethernet
– full backup and snapshot functionality
– can be mounted on local workstation using CIFS
• shared parallel storage (GPFS)
– used for data and work directories
– data transfer over FDR Infiniband
– currently no backup
– can also be mounted on local workstation using CIFS
• local disks or SSDs for scratch
– CARL compute nodes have local storage (1-2TB per node)
– EDDY compute nodes have 1GB RAM disk (for compatibility)
– usable during job run time
23.02.2017 Introduction to HPC @ UOL 20
Scientific Computing
V. School of Mathematics and Science
New Directory Structure
• on every filesystem ($HOME, $DATA, $WORK) users will
have their own subdirectory
– e.g. for $HOME
drwx------ abcd1234 agsomegroup /user/abcd1234
– default permissions prevent other users from seeing the contents
of their directory
– user can give permissions to others to access files or
subdirectory as needed (user‘s responsibility)
– file and directory access can be based on primary (the working
group) and secondary (e.g. the institute) Unix groups
– recommendation: keep access restricted on $HOME and if
needed share files/dirs. on $DATA or $WORK
https://wiki.hpcuser.uni-oldenburg.de/index.php?title=File_system_and_Data_Management#Managing_access_rights_of_your_folders
23.02.2017 Introduction to HPC @ UOL 21
Scientific Computing
V. School of Mathematics and Science
File Systems
File Env.
Path Used for
System Variable
Home $HOME /user/abcd1234 critical data that cannot easily be
reproduced (program codes, initial
conditions, results from data analysis)
Data $DATA /gss/data/abcd1234 important data from simulations for
on-going analysis and long term
(project duration) storage
Work $WORK /gss/work/abcd1234 data storage for simulation runtime,
pre- and post-processing, short term
(weeks) storage
Scratch $TMPDIR /scratch/<job-dir> temporary data storage during job
runtime
• home and data can be mounted on local workstations
• data may have some kind of backup in the future
• special quota rule for work
23.02.2017 Introduction to HPC @ UOL 22
Scientific Computing
V. School of Mathematics and Science
Quotas
https://wiki.hpcuser.uni-oldenburg.de/index.php?title=File_system_and_Data_Management#Quotas
• on every file system default quotas are in place
– home and data have 1TB and 2TB, respectively
– work has 10TB
– maybe increased upon request
• special quota on work
– in addition to hard limit above, work also has soft quota of 2TB
– if usage is over soft quota a grace period of 30 days is triggered
– after grace period no data can be written to work by user
clean up your data on work regularly
23.02.2017 Introduction to HPC @ UOL 23
Scientific Computing
V. School of Mathematics and Science
File System Shares
https://wiki.hpcuser.uni-oldenburg.de/index.php?title=Local_Mounting_of_File_Systems
• as before you can mount your $HOME directory on your
local workstation
• it is also possible to mount $DATA locally (work in
progress)
• server address for mounting are
$HOME //daten.uni-oldenburg.de/hpchome
$DATA //daten.uni-oldenburg.de/hpcdata
(data does not work yet)
– for Windows connect a network drive
– for Linux add information in /etc/fstab
23.02.2017 Introduction to HPC @ UOL 24
Scientific Computing
V. School of Mathematics and Science
Modules
• user environment can be modified by loading/unloading
modules
– modules provide access to compiler, libraries, and scientific
applications
– typically modules for applications load all required dependencies
• changes with the new system
– Lua-based module system Lmod (rather than TCL-based)
– module commands are the same as before
– more features (caching, default modules, searching)
– module names may be different (capitalization and version suffix)
23.02.2017 Introduction to HPC @ UOL 25
Scientific Computing
V. School of Mathematics and Science
Module Commands
https://wiki.hpcuser.uni-oldenburg.de/index.php?title=User_environment_-_The_usage_of_module_2016
• find modules
module available [module-name]
module spider [module-name]
– list all modules [with given module name]
– spider is case-insensitive and understands reg-exp
• load/unload
module load <module-name>
module remove <module-name>
– to return to a default state
module restore
• information about modules
module list
module help <module-name>
module spider <module-name>
23.02.2017 Introduction to HPC @ UOL 26
Scientific Computing
V. School of Mathematics and Science
Examples: Module Commands
$ module list
1) hpc-uniol-env 2) slurm/current
$ module load GCC/4.9.4
$ module list
1) hpc-uniol-env 2) slurm/current 3) GCC/4.9.4
4) …
$ module swap GCC/4.9.4 GCC/5.4.0
$ module restore
$ module purge
$ module load hpc-uniol-env
23.02.2017 Introduction to HPC @ UOL 27
Scientific Computing
V. School of Mathematics and Science
Toolchains
https://wiki.hpcuser.uni-oldenburg.de/index.php?title=Toolchains
• toolchains provide an environment for building
applications (including from your own codes)
• toolchains may include compilers, MPI library, numerical
libraries
– different versions to combine multiple versions of components
• examples:
– goolf: GCC, OpenMPI, OpenBLAS, ScaLAPACK, FFTW
– gompi: GCC, OpenMPI
– intel: Intel compilers, MPI, MKL
• if needed you may request additional toolchains from
http://easybuild.readthedocs.io/en/latest/eb_list_toolchains.html
23.02.2017 Introduction to HPC @ UOL 28
Scientific Computing
V. School of Mathematics and Science
SLURM JOB SUBMISSION
23.02.2017 Introduction to HPC @ UOL 29
Scientific Computing
V. School of Mathematics and Science
SGE vs SLURM Job Script
https://slurm.schedmd.com/rosetta.pdf
SGE: myscript.job SLURM: myscript.job
#!/bin/bash #!/bin/bash
#$ -cwd # -cwd not needed, default
#SBATCH --partition=carl.p
#SBATCH --ntasks=1
#$ -l h_rt=2:00:0 #SBATCH --time=0-2:00
#$ -l h_vmem=300M #SBATCH --mem-per-cpu=2G
#$ -N basic_test #SBATCH --job-name=basic_test
#$ -j y # default is single file
#$ -m ea #SBATCH --mail-type=END,FAIL
#$ -M your.name@uol.de #SBATCH --mail-user=your.name@uol.de
module load myExample/2017 module load myExample/2017
./myExample ./myExample
23.02.2017 Introduction to HPC @ UOL 30
Scientific Computing
V. School of Mathematics and Science
changed syntax
no longer needed SGE vs SLURM Job Script
new option https://slurm.schedmd.com/rosetta.pdf
SGE: myscript.job SLURM: myscript.job
#!/bin/bash #!/bin/bash
#$ -cwd # -cwd not needed, default
#SBATCH --partition=carl.p
#SBATCH --ntasks=1
#$ -l h_rt=2:00:0 #SBATCH --time=0-2:00
#$ -l h_vmem=300M #SBATCH --mem-per-cpu=2G
#$ -N basic_test #SBATCH --job-name=basic_test
#$ -j y # default is single file
#$ -m ea #SBATCH --mail-type=END,FAIL
#$ -M your.name@uol.de #SBATCH --mail-user=your.name@uol.de
module load myExample/2017 module load myExample/2017
./myExample ./myExample
23.02.2017 Introduction to HPC @ UOL 31
Scientific Computing
V. School of Mathematics and Science
SLURM Job Control Commands
https://wiki.hpcuser.uni-oldenburg.de/index.php?title=SLURM_Job_Management_(Queueing)_System
• submitting a job script
sbatch myscript.job
• viewing the job queue
squeue
– shows all jobs from all users per default
– add -u $USER to only show your own jobs
• status of all queues
sinfo
• information about running and finished jobs
sacct –j <job-id>
– SGE command qacct only worked for finished jobs
• kill or cancel a job
scancel <job-id>
23.02.2017 Introduction to HPC @ UOL 32
Scientific Computing
V. School of Mathematics and Science
Options for SBATCH
https://slurm.schedmd.com/sbatch.html
Option Short Form Description
--job-name=JobName -J JobName sets a name for job which is display in the
queue
--partion=<partition> -p <partition> (comma-separated list of) partition(s)
where the job should run, no default
--output=<filename> -o <filename> output files for STDOUT and STDERR,
--error=<filename> -e <filename> default is join in slurm-%j.out
--ntasks=<n> -n <n> number of tasks (e.g. for MPI parallel
jobs)
--mem-per-cpu=<m> memory per core/task, optional
--mem=<m> memory per node, exclusive with above
--mail-type=<MT> mail settings
--mail-user=...
23.02.2017 Introduction to HPC @ UOL 33
Scientific Computing
V. School of Mathematics and Science
Partitions
• partitions in SLURM are the equivalent of queues in SGE
– each node type has its own partition
– partitions define the available resources and set defaults
Default Default
Partition NodeType CPUs Misc
RunTime Memory
mpcs.p MPC-STD 24 10 375M
mpcl.p MPC-LOM 24 5 000M
mpcb.p MPC-BIG 16 2h 30G
mpcp.p MPC-PP 40 50G
1x Tesla
mpcg.p MPC-GPU 24 10 375M
P100 GPU
carl.p combines mpcs.p, mpcl.p, and mpcg.p
23.02.2017 Introduction to HPC @ UOL 34
Scientific Computing
V. School of Mathematics and Science
Parallel Jobs
• SLURM does not need parallel environments
• parallel jobs can be defined in different ways:
– number of tasks, SLURM will use as few nodes as possible
#SBATCH --ntasks=24
– number of nodes and number of tasks
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12
– number of CPUs (cores) per task
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=24
– the first two examples can be used for MPI parallel jobs, the last
is for OpenMP or thread-based parallelization (PE smp)
23.02.2017 Introduction to HPC @ UOL 35
Scientific Computing
V. School of Mathematics and Science
MPI
• applications using MPI are aware of the resources
allocated by SLURM for the job
– no need to specify the number of processes or hostnames
• starting parallel execution (within job script)
– load module for MPI library
– using MPI
mpirun ./myMPIprogram
– using SLURM
srun ./myMPIprogram
23.02.2017 Introduction to HPC @ UOL 36
Scientific Computing
V. School of Mathematics and Science
Example Job Script MPI
#!/bin/bash
#
#SBATCH -J TestJob # Jobname
#SBATCH -p cfdh_shrt.p # Partition: cfdh
#SBATCH -t 0-00:10 # time (D-HH:MM)
#SBATCH -o slurm.%N.%j.out # STDOUT
#SBATCH -e slurm.%N.%j.err # STDERR
#SBATCH --ntasks 192 # Number of Tasks
## alternatively (for better control of job distribution)
##SBATCH --nodes=8 # number of nodes
##SBATCH --ntasks-per-node=24 # tasks per node
module load impi # load Intel MPI
mpirun ./someParallelApplication
#srun ./someParallelApplication
23.02.2017 Introduction to HPC @ UOL 37
Scientific Computing
V. School of Mathematics and Science
SLURM Environment Variables
https://slurm.schedmd.com/sbatch.html#lbAG
SLURM_JOB_NAME name of the job
SLURM_JOB_ID the ID of the job allocation
SLURM_SUBMIT_DIR the directory from which sbatch was
invoked
SLURM_JOB_NODELIST list of nodes allocated to the job
SLURM_JOB_NUM_NODES total number of nodes in the job's
resource allocation
SLURM_NTASKS_PER_NODE number of tasks requested per node,
only set if --ntasks-per-node is specified
SLURM_NTASKS same as -n, --ntasks
SLURM_CPUS_PER_TASK number of cpus requested per task,
only set if --cpus-per-task is specified
SLURM_TASKS_PER_NODE number of tasks to be initiated on each
node
SLURM_MEM_PER_CPU same as --mem-per-cpu
SLURM_MEM_PER_NODE same as --mem
23.02.2017 Introduction to HPC @ UOL 38
Scientific Computing
V. School of Mathematics and Science
srun
• the srun command can be used to execute parallel commands
– not necessarily limited to MPI
– command is executed per default ntasks times
• example
srun hostname | sort -u
– will write a list of nodes allocated to the job to STDOUT
• srun can also be used to allocate resources if used outside of
a batch job script
– using same options as for sbatch (as command-line options)
23.02.2017 Introduction to HPC @ UOL 39
Scientific Computing
V. School of Mathematics and Science
Job Arrays
• job arrays can be used to run identical job scripts with
multiple parameter
– should be strongly preferred over submitting multiple jobs
• additional SBATCH in job script, e.g.
#SBATCH --array=0-15:4%2
– array indexes can be defined as ranges, with step sizes (:), and
limitations on simultaneous running (%) plus comma-separated
list
• additional environment variables
– most important is SLURM_ARRAY_TASK_ID to change
parameters for run
23.02.2017 Introduction to HPC @ UOL 40
Scientific Computing
V. School of Mathematics and Science
Job Array Example
. . .
#SBATCH --job-name=arrayJob
#SBATCH --output=arrayJob_%A_%a.out
#SBATCH --array=1-16
. . .
# Print this sub-job's task ID
echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID
# Do some work based on the SLURM_ARRAY_TASK_ID
# For example:
# ./my_process $SLURM_ARRAY_TASK_ID
# or
# ./my_process -i inputfile.$SLURM_ARRAY_TASK_ID.inp
Do not use sbatch in for-loops, this can always be transformed into a job array
23.02.2017 Introduction to HPC @ UOL 41
Scientific Computing
V. School of Mathematics and Science
Using $TMPDIR
• all MPC-nodes (and CFD-GPU) have local disks or SSDs
– for diskless nodes a RAM-disk (1G) is available for consistency
– mounted as /scratch
– during runtime a job-specific directory is created, the path is
stored in $TMPDIR
• SLURM is aware of local disks as Generic RESource
(GRES)
– you can request a GRES in your job script, e.g.
#SBATCH --gres=tmpdir:100G
– multiple GRES in comma-separated list (not multiple --gres)
23.02.2017 Introduction to HPC @ UOL 42
Scientific Computing
V. School of Mathematics and Science
Using $TMPDIR
• the local disks are intended for temporary files from I/O
intensive applications
– $TMPDIR will be deleted after the job is finished
• example
. . .
#SBATCH --gres=tmpdir:500G
. . .
# change to TMPDIR
cd $TMPDIR
# create a lot of data
./mysim > largeDataFile
# post-processing reduces data file
./mypp < largeDataFile > $SLURM_SUBMIT_DIR/smallResultFile
23.02.2017 Introduction to HPC @ UOL 43
Scientific Computing
V. School of Mathematics and Science
Using GPUs
• a total of 12 identical GPU nodes is available
– 9x MPC-GPU (mpcg.p) and 3x CFD-GPU (cfdg.p)
– one Tesla P100 GPU per node
• modules for
– CUDA SDK (examples and more)
– cuBLAS and cuFFT
• requesting GPUs in job script
– as GRES: #SBATCH --gres=gpu:1
– with TMPDIR: #SBATCH --gres=gpu:1,tmpdir:100G
– without GRES request you cannot use GPUs even if your job is
running on GPU node
23.02.2017 Introduction to HPC @ UOL 44
Scientific Computing
V. School of Mathematics and Science
MIGRATION TO THE NEW
SYSTEM
23.02.2017 Introduction to HPC @ UOL 45
Scientific Computing
V. School of Mathematics and Science
Migration to the new System
• new login nodes
• new HOME-directories
• new directory structure in file systems
• some or most modules renamed
• translate SGE scripts to SLURM
• all jobs may use parallel filesystem for I/O
• MPI-parallel jobs now use FDR Infiniband
23.02.2017 Introduction to HPC @ UOL 46
Scientific Computing
V. School of Mathematics and Science
Migration of Data
• the new cluster has a new $HOME directory
– central ISILON storage system
• links to the old $HOME and /data/work allow you to copy data
– read-only on the new cluster (as long as the old cluster is
accessible)
– please clean-up your data before copying
$ pwd
/user/abcd1234
$ ls
… old_home_abcd1234 old_work_abcd1234 …
$ cp -ar old_home_abcd1234/somedir .
– eventually you should delete all data in the old_* dirs. (6 month time
scale or so)
23.02.2017 Introduction to HPC @ UOL 47
Scientific Computing
V. School of Mathematics and Science
MATLAB
https://wiki.hpcuser.uni-oldenburg.de/index.php?title=Configuration_MDCS_2016
• on CARL/EDDY only Matlab R2016b is installed
– new integration is required (after installing R2016b locally)
– an upgrade of the integration files will be uploaded in the next
days
– updates at least once a year (b-versions)
• transition
– apart from the new version and integration nothing has changed
– license transition will be step by step (eventually with 32 licenses
remaining on the old system)
23.02.2017 Introduction to HPC @ UOL 48
Scientific Computing
V. School of Mathematics and Science
HERO and FLOW
• the old systems will be shutdown
– planned date is April 1st (at least for submission of new jobs)
– removal of the hardware is planned for mid May
23.02.2017 Introduction to HPC @ UOL 49
Scientific Computing
V. School of Mathematics and Science
HERO and FLOW
• the old systems will be shutdown
– planned date is April 1st (at least for submission of new jobs)
– removal of the hardware is planned for mid May
23.02.2017 Introduction to HPC @ UOL 50
Scientific Computing
V. School of Mathematics and Science
Need Help? Software Missing?
• see new wiki
– main page will be changed soon
– old pages will remain
• contact us
– e-mail: servicedesk@uni-oldenburg.de
(will reach everyone below and IT services)
– Stefan Harfst: stefan.harfst@uni-oldenburg.de
– Julius Brunken: julius.brunken@uni-oldenburg.de
– Wilke Trei: wilke.trei@forwind.de
23.02.2017 Introduction to HPC @ UOL 51