Introduction to NCI
National Computational Infrastructure
Download training materials here:
http://nci.org.au/services-support/training/
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
2 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
What is the NCI?
I
I
Peak Facility, Raijin, Cloud Service and Data management
Specialised Support
I
I
I
I
I
Climate system science
Astronomy
Earth Observation
Geophysics
Cloud Computing
3 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Allocation Schemes
I
National Computational Merit Allocation Scheme
I
Partner allocations
I
I
Major Partners: e.g. CSIRO, INTERSECT, GA, QCIF, BoM
University Partners: e.g. ANU, Monash, UNSW, UQ, USyd, Uni
Adelaide, Deakin
Flagship Projects
I
NCMAS includes NCI(raijin), iVEC(magnus, epic, fornax),
VLSCI(avoca), SF in Bioinformatics(barrine) and SF in Imaging and
Visualisation(MASSIVE).
Astronomy/Astrophysics, CoE in Climate Systems Science, CoE
Optics
Startup allocation
Director
4 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Distrubtions of Allocations :2014
Approximate distribution of allocations across all compute systems for
2014:
I
I
I
I
I
I
I
I
I
NCMAS 15%
CSIRO 21.4%
BOM 18.9%
ANU 17.7%
Flagships 5.0% (including CoECSS, TERN, Astro, CoE Optics)
INTERSECT 3.8%
GA 3.4%
Monash, UNSW, UQ, USyd, Uni Adelaide, 1.7% each
Directors share, QCIF, Deakin, MSI 6.3% in total
5 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
NCI HPC System
Integrated Infrastructure and Services
I
I
I
I
I
RAIJIN Fujitsu Primergy information.
Lustre Filesystems - raijin (/home and /short) and global (/g/data)
Cloud - OpenStack cloud (hosting services, specialised virtual labs,
new services, special interactive use)
High-end visualisation services and support (Vizlab)
Software Packages
6 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Getting Information
I
I
I
I
I
I
I
I
I
URL http://nci.org.au/
Detailed usage information
Raijin Quick Reference Guide
Detailed software information
Raijin FAQs
/g/data FAQs
Message of the Day (/etc/motd)
Emergency and Downtime Notices
NCI help email help@nci.org.au
7 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
New Petascale System
Fujitsu Primergy - raijin
I
I
I
I
I
I
I
3592 2X Intel Sandy Bridge E5-2670 (8 core, 2.6GHz)
57472 cores
Total memory 158Tb
Lustre filesystems: (/short, /home, /g/data)
$PBS JOBFS local to each node.
Infiniband network
See the system being installed.
8 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Cloud
NCIs Cloud services focus around:
I
I
I
Computation using the cloud
Data services using the cloud
Complementary services to NCIs HPC that are best provided
through cloud
NCI offers a NeCTAR node (National eResearch Collaboration Tools and
Resources):
I
Designed to optimize for computation and floating point (Intel
CPUs)
Designed for high speed data transfer (56Gigabit network between
nodes)
Designed for high speed IO (All SSD disk storage in the cloud)
NCI can offer a high speed interconnect between the NCI Lustre based
filesystems and NCI Cloud services.
9 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Data Storage
I
I
global Lustre filesystem /g/data/ - stores persistent data,
mounted on raijin and cloud nodes.
Mass Data storage - HSM storage with dual copies across two NCI
data centres. Effective storage for managing data that can be
staged in/out as part of batch processing.
RDSI national data collections - to be stored across the NCI data
resources listed above.
10 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
11 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
How to Apply a New Project (for CI)
I
I
Project leaders (Chief Investigators) will fill out on-line forms with
required details and be given a project ID.
Application process:
I
I
I
I
Partner (anytime)
Merit scheme (once a year, deadline Nov)
Start-up (anytime, max 5000 SU per year)
Commercial (anytime)
12 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
How to Apply a New Account (for User)
I
I
I
I
I
I
I
I
I
Register as a New User: register first. The registration ID is a
number such as 12345, it is not a user ID.
Connect to Project: connection form should be submitted.
Accounts are set up when a CI approves a connection request.
New user will receive an email with account details.
NCI usernames are of the form abc123 - abc for your initials and
123 for affiliation.
Passwords are sent by SMS to the mobile number provided when
you registered.
Passwords can be given over the phone if necessary, but not by
email.
Use the passwd command to change this when you first log in.
An automated on-line tool for users to set passwords is being
developed, expected availability mid 2015.
13 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Project accounting
I
I
I
I
All use on the compute systems is accounted against projects. Each
project has a single grant of time per 3 month quarter.
If your username is connected to more than one project you will
have to select which project to run jobs under.
A project may span several stakeholders (eg BoM and CSIRO).
To change or set the default project, edit your .rashrc file in your
home directory, and change the PROJECT variable as desired. A
typical .rashrc file looks like
setenv PROJECT c25
setenv SHELL /bin/bash
Login after editing .rashrc to see the changes.
14 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Default Project
I
The following displays the usage of the project in the current quarter
against each of the stakeholder funding the project.
nci_account
By adding -v you can see who is using the compute time.
nci_account -v
You can also use -P for other project and -p for different quarter, ie:
nci account -P c25 -p 2014.q2 -v
I
I
Further information will be presented under nci account - most
notably storage usage.
If you have a project that is externally funded and requires more
resource than provided, please contact us. It is possible to set up
special funding, and track under nci account.
15 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
16 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Establish Connection
I
Connection under Unix/Mac:
I
I
I
For ssh - ssh (terminal)
For scp/sftp - scp/sftp (terminal)
For X11 - ssh -X, make sure to install XQuartz for OSX 10.8 or
above. (terminal)
Connection under Windows:
I
I
I
For ssh - putty, mobaxterm
For scp/sftp - putty, Filezilla, winscp
For X11 - Cygwin, XMing, mobaxterm, Virtual Network Computing.
Caution!
Be sure to logout of xterm sessions, and quit the Window Manager
before leaving the system.
17 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Connecting to raijin
The hostname of the Fujitsu Primergy Cluster is
raijin.nci.org.au
and can be accessed using the secure shell (ssh) command, for example,
ssh -X abc123@raijin.nci.org.au
Your ssh connection will be to one of six possible login nodes, raijin{1,6}
(If ssh to raijin fails, you should try specifying one of the nodes, i.e.
raijin3.nci.org.au).
18 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Secure use of ssh
passphrase-less ssh keys: allow ssh to log in without a password.
Caution!
Day-to-day use is strongly discouraged.
This considerably weakens both NCI and home institution system
security. (Instead consider a key with passphrase + ssh-agent on your
workstation.)
Can be useful to support copyq batch jobs:
I
I
Generate a new key specifically for such transfers
Use rrsync to restrict what it can do
More information: Using ssh keys
19 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
20 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
UNIX environment
The working environment under UNIX is controlled by shells
(command-line interpreter). The shell interprets and executes user
commands.
I
I
I
I
The default is bash shell (also popular is tcsh, you may use ksh)
Shell can be changed by modifying .rashrc
Shell commands can be grouped together into scripts
Unix Quick Reference Guide
Note
Unix is case sensitive!!
21 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
UNIX environment
The shell provides environment variables that can be accessed across all
the processes initiated from the original shell e.g. login environment.
exec on login and compute nodes
exec on login nodes only
modules
csh/tcsh
.cshrc
.login
.login
sh/bash/ksh
.bashrc
.profile
.profile
tcsh syntax
setenv VARIABLE value
bash syntax
export VARIABLE=value
For an explanation of environment variables see Canonical user
environment variables
22 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Environment Modules
Modules provide a great way to easily customize your shell environment
for different software packages. The module command syntax is the
same no matter which command shell you are using.
Various modules are loaded into your environment at login to provide a
workable environment.
module list
module avail
module show name
module load
module unload
# To see the modules loaded
# To see the list of software for which environments
have been set up via modules
# To see the list of commands that are carried out
in the module
# To load the environment settings required by a
software package
# To remove extras added to the environment for a
previously loaded software package. This is
extremely useful in situations where different
package settings clash.
23 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Environment Modules
Note
To automate environment customisation at login module load
commands can be added to the .login (tcsh) or .profile (bash) files.
Users should be aware that different applications can have incompatible
environment requirements so loading multiple application modules in
your dot file may lead to problems. We recommend that modules are
loaded in scripts as needed at runtime and likewise discourage the use of
module commands in shell configuration (dot) files.
More advanced information on modules can be found in the Modules
User Guide.
24 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Editors
Several editors are available
I
I
I
vi
emacs
nano
If you are not familiar with any of these you will find that nano has a
simple interface. Just type nano.
Caution!
Use dos2unix if your input/job script files were edited on a windows
machine.
25 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Exercise 1: Getting started
Logging on to raijin - use the course account.
ssh -X aaa777@raijin.nci.org.au
Remember to read the Message of the Day (MOTD) as you login.
Commands to try:
hostname
nci_account
module list
module avail
#
#
#
#
to see the node you are logged into
to see the current state of the project
to check which modules are loaded on login
to see which software packages are installed
and accessible in this way.
module show pbs # to see what environments are set by a module
Note
In .cshrc (tcsh) or .bashrc (bash) that the intel-fc, intel-cc and openmpi
modules are loaded by default.
26 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
27 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Batch Queueing System
I
Most jobs require greater resources than are available to interactive
processes and must be scheduled by the batch job system
(interactive mode available).
Queueing system:
I
I
I
I
I
distributes work evenly over the system
ensures that jobs cannot impact each other (e.g. exhaust memory or
other resources)
provides equitable access to the system
Raijin uses a customised version of PBSPro.
nf limits display the limits that are set for your projects.
Default queue limit
Note
Job charging is based on wall clock time used, number of cpus requested,
queue choice.
28 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Queue Limit
29 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Batch queue structure
I
normal
I
I
I
I
express
I
I
I
Default queue designed for production use
Charging rate of 1 SU per processor-hour (walltime) on raijin
Requests for ncpus > a node (16 cores) need to be in multiples of 16.
If your grant is exhausted -> lower priority (bonus).
High priority for testing, debugging etc.
Charging rate of 3 SUs per processor-hour (walltime)
Smaller limits to discourage production use
(ncpus limits to 128, memory per core is 32GB, check nf limits for
project-specific detail. )
copyq
I
Used for file manipulation - e.g. copying files to MDSS
30 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Using the Queueing System
I
I
I
Read the How to Use PBS
Use nf limits to see your user/project queue limits.
Request resources for your job (using qsub).
I
I
I
I
walltime
memory (32GB, 64GB, 128GB per node)
disk (jobfs)
number of cpus
PBSPro will then
I
I
I
I
I
schedule the job when the resources become available
prevent other jobs from infringing on the allocated resources
display progress of the jobs (qstat, nqstat or nqstat anu)
terminate the job when it exceeds its requested resources
return stdout and stderr in batch output files
31 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Job Script Example
Example
#!/bin/bash
#PBS
#PBS
#PBS
#PBS
#PBS
#PBS
-l
-l
-l
-l
-l
-l
walltime=20:00:00
mem=2GB
jobfs=1GB
ncpus=16
software=xxx (for licenced software)
wd (to start the batch job in the working
directory from which it was submitted.)
my_program.exe
32 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Job Scheduling
I
I
Job priority is based on resourse requested, currently running jobs
under the user/project, and grant allocation.
Jobs start when sufficient resources are available. (qstat -s
jobid to see comment why its not running)
Tips
I
I
Near the end or at beginning of a quarter, busy period.
higher priority
I
I
I
shorter walltime request.
smaller memory request.
larger number of cpus request (to some extend).
33 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Long-running jobs
34 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Long-running jobs
I
I
When run jobs last longer than the queue limits
checkpoint/restart functionality is recommended for workflows that
require long run times. Long run times expose users to system
and/or numerical instabilities.
Example scripts for self-submitting jobs can be found at FAQs
Caution!
Checkpoint/restart is not a filesystem or PBSPro capability - It must be
implemented by the user or software vendor.
35 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
stdout and stderr files
PBSPro returns the standard output and standard error from each job in
.o***** and .e***** files, respectively.
Example script.o123456
============================================================
Resource Usage on 2013-07-20 12:48:04.355160:
JobId:
123456.r-man2
Project:
c25
Exit Status: 0 (Linux Signal 0)
Service Units: 0.01
NCPUs Requested: 1
NCPUs Used: 1
CPU Time Used: 00:00:43
Memory Requested: 50mb
Memory Used: 13mb
Vmem Used: 52mb
Walltime requested: 00:10:00
Walltime Used: 00:00:49
jobfs request: 100mb
jobfs used: 1mb
============================================================
36 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
stdout and stderr files
I
I
.o***** file contains the output arising from the script (if not
redirected in the script) and additional information from PBS.
.e***** file contains any error output arising from the script (if not
redirected in the script) and additional information from PBS. For a
successful job it should be empty.
Common errors to look for in the .e***** file:
I
I
Command not found. (check module list, path)
=>> PBS: job terminated: walltime 172818sec exceeded limit
172800sec (Increase runtime request)
=>> PBS: job terminated: per node mem 2227620kb exceeded limit
2097152kb (Increase memory per node request)
Segmentation fault. (check your program)
37 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Monitoring the progress of jobs
Useful commands
qstat
nqstat
nqstat_anu
qstat -s
qps jobid
qls jobid
qcat jobid
qcp jobid
qdel jobid
#
#
#
#
#
#
#
#
#
show the status of the PBS queues
enhanced display of the status of the PBS queues
enhanced display of the status of the PBS queues
display additional comment on the status of the job
show the processes of a running job
list the files in a job's jobfs directory
show a running job's stdout, stderr or script
copy a file from a running job's jobfs directory
kill a running job
Caution!
Please use nqstat anu -a | grep $USER to see the cpu% of your
jobs. An efficient parallel job should be close to 100%.
38 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Exercise 2: Submitting jobs to the batch queue
cd /short/$PROJECT/$USER/
tar xvf /short/c25/intro_exercises.tar
cd INTRO_COURSE
cat runjob
qsub runjob
watch qstat -u $USER
... (wait until job finishes, use Ctrl+C to quit)...
runjob
I
This job searches the first n prime number. Please feel free to
change the number n, or the PBS resource to see the behaviour of
the outcome.
View the output in the file runjob.o**** and any error messages
in runjob.e**** after the job completes.
39 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Interactive jobs
When running jobs on login nodes, users may see the following message
when running interactive process on login nodes:
RSS exceeded.user=abc123, pid=12345, cmd=exe,
rss=4028904, rlim=2097152 Killed
Each interactive process you run on the login nodes has imposed on it a
time (30mins) limit and a memory use (2GB) limit. If you want to run
longer or more memory intensive interactive job, please submit an
interactive job.
I The -I option for qsub will result in an interactive shell being
started out on the compute nodes once your job starts.
I A submission script cannot be used in this mode you must provide
all qsub options on the command line.
I To use X windows in an interactive batch job, include the -X option
when submitting your job this will automatically export the
DISPLAY environment variable.
40 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Exercise 3: Interactive Batch Jobs
Sometimes the resource requirements (mem, walltime etc) are larger than
allowed. You can run an interactive batch job as follows:
qsub -I -l walltime=00:10:00,mem=500Mb -P c25 -q express -X
qsub: waiting for job 215984.r-man2 to start
qsub: job 215984.r-man2 ready
[aaa777@r73 ]$ xeyes &
[aaa777@r73 ]$ module list
Currently Loaded Modulefiles:
1) pbs
4) intel-fc/12.1.9.293
2) dot
5) openmpi/1.6.3
3) intel-cc/12.1.9.293
[aaa777@r73 ]$ cd /short/$PROJECT/$USER/INTRO_COURSE
[aaa777@r73 ]$ ./matrix.exe (use Ctrl+C to quit)
[aaa777@r73 ]$ logout
qsub: job 215984.r-man2 completed
41 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
42 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Filesystems
Things to consider I
I
I
I
I
I
I
Transferring large data files to and from raijin: scp, rsync, filezilla
Use designated data mover nodes, not interactive login nodes.
r-dm.nci.org.au
How much data do you really need to keep?
Do you need metadata or a self-describing file format?
Decide on a structure for archived data before you start.
Staging in archived data from tape (Offline) to disk before starting
jobs.
Archiving results automatically at the end of batch jobs.
43 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
RAIJIN Filesystems Overview
The Filesystems section of the userguide has this table in greater detail:
Filesystem
Purpose
Quota
Backup
Availability
Time limit
/home
Irreproducible data eg.
source code
2GB (user)
Yes
raijin
None
/short
Input/output data files
72GB (project)
No
raijin
365 days
/g/data/
Processing large data
project dependent
No
Global
No
$PBS JOBFS
IO intensive data
100MB per node
default
No
Local to node
Duration of job
MDSS
Archiving
files
20GB
2
copies
in
two
different
locations
External
access
using
mdss
commands
No
large
data
Note
These limits can be changed on request.
44 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Monitoring disk usage
I
I
I
lquota gives lustre filesystem usage (/home, /short, /g/data).
nci account gives other filesystem usage (/short, /g/data, mdss)
short files report
gdata1 files report
gdata2 files report gives breakdown:
-G <project> lists files owned by group <project>.
-P <project> lists files in /short/<project>.
Caution!
/short and /g/data are not backed up so it is the users responsibility to
make sure that important files are archived to the MDSS or off-site.
45 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Input/Output Warning
I
Lots of small IO to /short (or /home) can be very slow and can
severely impact other jobs on the system.
Avoid dribbly IO, e.g. writing 2 numbers from your inner loop.
Writing to /short every second is far too often!
Avoid frequent opening and closing of files (or other file operations)
Use /jobfs instead of /short for jobs that do lots of file
manipulation
To achieve good IO performance, try to read or write binary files in
large chunks (of around 1MB or greater)
46 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Exercise 4: Writing to /short
I
Use the lquota and du commands to find how much disk space
you have available in your home, short and gdata directories.
Use the short files report or gdata1 files report to
see who uses most of the quota. Look at your projects /short
area. Anyone from your project can create their own directories and
files here. There will be a directory of your own under your project
area.
Note the different group ownership in the DATA directory.
ls -l /short/c25/DATA
47 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Exercise 4: Writing to /short (cont)
Change the permissions on your files and directories to allow/disallow
others in your group to access them.
man chmod
chmod g+r
chmod g-r
chmod g+w
chmod g+x
filename
filename
filename
filename
#
#
#
#
allow group read to filename
disallow group read to filename
allow group write to filename
allow group execute to filename
Verify with your neighbour that your file permissions are as expected.
Note
I
I
To be able to go into a directory requires execute permission
(chmod -R +X folder)
You may not want to share files by making your /home directory
world readable. For members of the same project you can use
/short/$PROJECT. Talk to us about alternatives if you need to
share source code, data files etc.
48 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
ACL Access Control Lists
ACLs are an addition to the standard Unix file permissions (r,w,x,-) for
User, Group, and Other for read, write, execute and deny permissions.
ACLs give users and administrators flexibility and direct fine-grained
control over who can read, write, and execute files.
Caution!
We strongly recommend that you consult with NCI before using ACLs.
49 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Using the MDSS
The Mass Data Store was migrated to a new SGI Hierarchical Storage
Management System in January 2012.
I
I
MDSS is used for long term storage of large datasets.
If you have numerous small files to archive - bundle into a tarfile
FIRST.
Watch our tape robot at work
Every project has a directory on the MDSS.
All members of the project group have read and write access to the
top project directory.
mdss dmls -l gives information what is online (on disk cache)
and what is on tape.
50 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Using the MDSS
I
The mdss command can be used to get and put data between
the login and copyq nodes of the raijin and the MDSS, and also
list files and directories on the MDSS.
netcp and netmv can be used from within batch jobs to
I
I
Generate a batch script for copying/moving files to the MDSS
Submit the generated batch script to the special copyq which runs
copy/move job on an interactive node.
netcp and netmv can also be used interactively to save you work
creating tarfiles and generating mdss commands.
I
I
-t create a tarfile to transfer
-z/-Z gzip/compress the file to be transferred
Caution!
Always use -l other=mdss when using mdss commands in copyq. This
is so that jobs only run when the the mdss system is available.
51 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Exercise 5: Using the MDSS
To see these commands in action do
cd /short/$PROJECT/$USER
mdss get Data/data.tar
ls -l
tar xvf data.tar
ls
rm data.tar
mdss mkdir $USER
netmv -t $USER.tar DATA $USER
watch qstat -u $USER
... (wait until job finishes, use Ctrl+C to quit)...
less DATA.o*
mdss ls $USER
mdss rm $USER/$USER.tar
52 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Using /jobfs
I
Only available through queueing system:
Request like -ljobfs=1GB
Access via $PBS JOBFS environment variable
All files are deleted at end of job. Copy what you need to /short
or other global filesystem in job script.
Request larger than 396GB will be automatically redirected to
/short (but will still be deleted at the end of the job).
Cannot use mdss or netcp commands for files on /jobfs.
53 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Exercise 6: Managing Files between /short, /jobfs and MDSS
Submit a batch job with a /jobfs request, where the job:
I
I
I
I
Copies an input file from /short to /jobfs
Runs a code to use the input file and generate some output
Saves the output data back to the /short area
Uses the netcp command to archive the data to the MDSS
54 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Exercise 6: Managing Files between /short, /jobfs and MDSS
Read the runjobfs script then submit it to the queueing system,
monitor the job with qstat, and examine the job output files:
cd /short/$PROJECT/$USER/INTRO_COURSE
qsub runjobfs
watch qstat -u $USER
... (wait until job finishes, use Ctrl+C to quit)...
cat runjobfs.e*
cat runjobfs.o*
Check out the output file that this job created on /short and the copy on
the MDSS
cd /short/$PROJECT/$USER
ls -ltr
less save_data.o*
mdss ls $USER
mdss rm -r $USER
55 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Outline
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
56 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Troubleshooting
I
I
I
.e and .o stderr and stdout files - check your input!
PBS emails, MOTD and Notices and News
Read the FAQs
I
I
I
I
Why are my jobs not running?
Why does my job run fine on my local machine, but not work on
raijin?
My PBS job script generates the error message module: command
not found. Whats wrong?
How do I access files on NCI systems using a graphical user interface?
How do I transfer files between massdata and my local machine?
Read the /g/data FAQs
57 / 58
Introduction
Accounting
Connecting
UNIX
Job Scheduling
Filesystems
Troubleshooting
Issues with Running Jobs
I
CPU Over/Under subscription
I
I
I
I
I
I
I
I
Due to inconsistent number of ncpus=X request vs mpirun -np Y,
where Y != X.
OMP NUM THREADS != $PBS NCPUS
Use mpirun --bind-to-socket -npernode 2 <exe>-T 8
<args>
Use mpirun --bind-to-none program.exe for ncpus
<16 jobs
software specific keywords.
%nproc in gaussian.
NPAR in VASP. The recommended value should be somewhere
between SQRT( ncpus ) ... ncpus/2 and be a factor of 16.
Unbalanced %cpu usage
0% cpu usage (sleep, hung, or dead job). If its a file manipulation
job, use copyq instead.
58 / 58