How to Use the ITP HPC Clusters
Login to the HPC Cluster Change your password Set up your environment Using Sun Grid Engine |
Login to the HPC Cluster
Depending on your account use the appropriate login node (also referred as master-node) of one of the following systems:
Cluster | Login node | RSA key fingerprint |
REGULUS | regulus.uibk.ac.at | 97:a2:1e:0d:d8:7e:2a:b7:44:1c:6a:19:7e:39:f5:b7 |
TEAZER | teazer.uibk.ac.at | cd:61:96:a2:42:01:1f:19:42:cf:76:06:b8:1e:b9:1d |
The server teazer.uibk.ac.at is the login node to the High Performance Computing (HPC) cluster. The cluster can be contacted by slogin or ssh. Login to the HPC cluster with:
slogin regulus.uibk.ac.at -l <user-name>
For Windows users: See the ZID's Getting Started Tutorial to establish a connection within Windows.
Remote access
For security reasons, our LinuX cluster systems are only available from inside the University's domain IPs. If you want to access the systems from outside, you need to set up a VPN connection. See the ZID's instructions for setting up a VPN client for various operating systems.
Change your password
Change your password with the command yppasswd. After having typed your current password you have to input your new password twice.
$ yppasswd
Changing NIS account information for "user ID" on regulus.uibk.ac.at.
Please enter old password: <type old value here>
Changing NIS password for "user ID" on regulus.uibk.ac.at.
Please enter new password: <type new value>
Please retype new password: <re-type new value>
Changing NIS password has been changed on regulus.uibk.ac.at.
Set up your environment
The environment modules package provides a great way to easily customize your Linux environment (PATH, MANPATH, INCLUDE, LD_LIBRARY_PATH), especially on the fly. Using the modules environment allows you to set and unset cleanly your path and environment variables by loading or unloading an installed software package with a module file (module load module_file). By issuing the command module avail you get a list of the installed software, respectively the associated module files, available on the cluster, as e.g. the Intel Compiler.
Here you can find more detailed information about the modules environment and its usage.
Using Sun Grid Engine
The cluster's job scheduling is operated by the open-source Sun Grid Engine (SGE) system (version 6.1u4). As a user you might want to be familiar with the following commands: qsub, qstat, qdel and qhost, which are briefly described in the following. For more information consult the respecitve man pages or see the vendor documentation, especially the SGE's User's Guide.
Vendor documentation of SGE 6.1U4:
- Sun N1 Grid Engine 6.1 Collection (Sun Online Documentation)
- Sun N1 Grid Engine 6.1 Release Notes
- Sun N1 Grid Engine 6.1 User's Guide
- Sun N1 Grid Engine 6.1 Administration Guide
- Sun N1 Grid Engine 6.1 Installation Guide
Submitting batch jobs (qsub)
The command qsub allows to submit jobs to the batch-system. qsub uses the following syntax:
qsub [options] scriptfile [script arguments]
where scriptfile represents the path to either a binary or a script containing the commands to be run by the job using a shell.
There are two ways to submit a job.
Method 1: (not recommended)
You may add the options directly after the qsub command, like:
qsub -q all.q -o output.dat -i input.dat -l swap_free=200M scriptfile
Method 2: (recommended)
Options can be written to a file (job description file). The file can be passed on to the command "qsub".
qsub job
The content of the file "job" may look like the following:
#$ -q all.q
|
Description of the most important options of qsub:
Input/Output | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
-i path | standard input file | ||||||||||||||
-o path | standard output file | ||||||||||||||
-e path | standard error file | ||||||||||||||
-j yes|no | join standard error output to standard output (yes or no) | ||||||||||||||
Notification | |||||||||||||||
-M email-address | notifications will be sent to this email address | ||||||||||||||
-m b|e|a|s|n | notifications on different events: b ... begin, e ... end, a ... abort, s ... suspend, n ... no mail (default) Do not forget to specify an email address (with -M) if you want to get these notifications. |
||||||||||||||
Resources | |||||||||||||||
-l h_rt=[hours:minutes:]seconds | requested real time; the default depends on the queue |
||||||||||||||
-l mem_free=size | request free memory of "size" bytes | ||||||||||||||
-l swap_free=size | request to use swap space with "size" bytes | ||||||||||||||
-w v | check whether the syntax of the job is okay (do not submit the job) | ||||||||||||||
-hold_jid job-id | start job only if the job with the job id "job-id" has finished | ||||||||||||||
Other useful options | |||||||||||||||
-N name | name of the job | ||||||||||||||
Parallel jobs / parallel environments | |||||||||||||||
-pe parallel-environment process-number | You have to specify a parallel environment and the number of processes on which your mpi application should run. With the parallel environments you are able to control the job distribution. The following parallel environments are available:
|
There are differences to consider not only between submitting sequential and parallel jobs, but also between the different supported parallel programming models. The following examples illustrate the different procedures:
Sequential batch jobs
qsub job-file
where the content of the file "job-file" looks like:
(if you just copy&paste this example please be aware of line breaks)
#!/bin/bash
|
Parallel (MPI) batch jobs
qsub job-file
where the content of the file "job-file" looks like:
(if you just copy&paste this example please be aware of line breaks)
#!/bin/bash
|
MPICH implementation: (outdated)
When using MPICH the script file "script.sh" may look like this:
#!/bin/bash
|
Assure that "script.sh" has execution rights (chmod +x script.sh).
OpenMPI implementation:
For OpenMPI the -machinefile option of mpirun must not be specified. Use a script file "script.sh" like the following:
#!/bin/bash
|
Assure that "script.sh" has execution rights (chmod +x script.sh).
Parallel (OpenMP) batch jobs
qsub job-file
where the content of the file "job-file" looks like:
(if you just copy&paste this example please be aware of line breaks)
#!/bin/bash
|
The script file must have execution rights (chmod +x script.sh)
Note that it makes no sense to ask for more processes than cores are available on the largest machines in the cluster (max. 48 cores at the moment).
Observing a job (qstat)
qstat [options]
Options of qstat:
-u user | Prints all jobs of a given user. |
-j job-id | Prints full job informations of the job with the given job-id. Here you can see the reason if your job is pending. |
-f | Prints all queues and jobs. |
-help | Prints all possible qstat options. |
In case of pending jobs, you might also get some hints on why your job with the job identifier 'job-id' is still waiting in queue, by executing
qalter -w p job-id
Deleting a job (qdel)
qdel job-id
Delete a job with the job identifier "job-id".
Status information and resource limitations
Obtaining current host status (qhost)
To obtain current status information for the cluster execution hosts and their configuration parameters, execute qhost or for a more substantial representation:
qhost -F
Slot limitations (qquota)
Due to the limited resources the number of available slots per user has been restricted to 256 slots for power users and 160 slots for standard users. Execute
qquota
to see your actual resource consumption.
Note: Please contact the ITP cluster administration if you need more resources for the progress of an urgent project.
Submitting interactive jobs (qsh)
The submission of interactive jobs is useful in situations where a job requires some sort of direct intervention. This is usually the case for X-Windows applications or in situations in which further processing depends on your interpretation of immediate results. A typical example for both of these cases is a graphical debugging session.
The only supported method for interactive sessions on the Opteron cluster is currently to start an interactive X-Windows session via the SGE's qsh command. This will bring up an xterm from the executing node with the display directed either to the X-server indicated by your actual DISPLAY environment variable or as specified with the -display option. Try qsh -help for a list of allowable options to qsh. You can also force qsh to use the options specified in an optionfile with:
qsh -@ optionfile
A valid "optionfile" might contain the following lines:
#Name your job
|
Note: Interactive jobs are not spooled if the necessary ressources are not available, so either your job is started immediately or you are notified to try again later.
Interactive sequential jobs
Start an interactive session for a sequential program by executing:
qsh -q all.q
Prepare your session as needed, e.g. by loading all necessary modules within the provided xterm and then simply start your sequential program on the executing node.Interactive parallel jobs
For a parallel program execute
qsh -q par.q -pe parallel-environment number-of-processes
with the SGE's parallel environment of your choice (see the list of available parallel environments with qconf -spl) and the number of processes you plan to debug on, just as if submitting a parallel job with qsub to the SGE's parallel queue.
Start your parallel MPI program as depicted within the "script.sh" files for parallel MPI batch jobs above. For OpenMP jobs export the OMP_NUM_THREADS variable with export OMP_NUM_THREADS=$NSLOTS and start your job.
High priority interactive sessions
If your job is urgent and the necessary ressources are not available temporarily, submit your interactive session to the developers' queue with
qsh -q dev.q
Note that the developers' queue has a much stricter time limitation than the other available queues. Nevertheless, please make extra sure to end high priority sessions when no longer needed.
Adapted from http://www.uibk.ac.at/zid/systeme/hpc-systeme by courtesy of the ZID HPC Team.