Compute Services
HPC systems available to University of Innsbruck (UIBK) users (faculty, advanced students) include servers operated by the ZID in Innsbruck (LEO3e, LEO4, and LEO5 clusters), and the VSC consortium in Vienna (VSC4, VSC5).
These systems enable users to run programs that are bigger and consume more resources than is possible with typical workgroup or departmental servers:
- Parameter studies (even using moderately parallelized or sequential programs) with many thousands of independent runs
- Parallel programs using hundreds of CPUs
- Parallel programs using dozens or hundreds of GB of main memory
- Programs with intensive use of disk storage, possibly using terabytes of temporary disk space
Users may bring in their own (or open source) programs on source level (e.g. C, C++, Fortran) or use preinstalled software. Programs using large amounts of computing resources (compute time and memory) should be parallelized, using tools such as MPI or OpenMP.
The hardware offered by the ZID and VSC includes distributed memory clusters with dozens to thousands of individual compute servers (nodes). Each node typically has 20 to 64 CPU cores and 64 to 2048 GB of main memory. As of April 2023, the UIBK Leo clusters offer a total of 160 compute nodes, 6424 CPU cores, 52 GPUs, and 38 TB of main memory.
On the LEO clusters, the $HOME file systems offer persistent storage with self-service access to backups of recent versions via a snapshot mechanism. The $SCRATCH file systems provide shared access to large, fast disk subsystems for intensive I/O. The HPC servers are directly connected to the campus backbone and high performance inter-university network (Aconet) to help save time for data transfers. Users should note that, unless described otherwise, there is usually no backup of data stored in HPC systems.
All machines feature variants of the Linux operating system (as of 2023, clones of RedHat versions 7 or 8) and can be accessed via SSH. The primary user interface is the standard UNIX/Linux shell. To prevent resource contention, users start their programs via a resource management system (SGE, Slurm). For resource-intensive work, non-interactive use is preferred.
Our resource management systems use a fair share scheduling method which tries to maintain a long-term balance of the resources allocated to each user. Depending on the load situation and the resource requirements, a given job may be started right away, or there may be a delay, typically in the order of a few minutes up to several days. While our systems offer a high total throughput, there is no guaranteed time-to-result, consequently our systems cannot be used for time critical applications.
High quality hardware ensures stable operation with very low failure rates. Thanks to automatic monitoring, automated rollout of new / repaired machines, and maintenance contracts, malfunctions can be discovered and corrected quickly.
Data Security
Our HPC clusters offer a level of data security which is typical for standard UNIX-like multiuser HPC systems. Systems are operated by professional staff. Consequently security breaches are unlikely, but cannot be absolutely excluded.
The clusters are behind a firewall that restricts access to SSH clients within the uibk.ac.at network. Access from outside requires a VPN connection.
As for most ZID services, user accounts are created after a formal application process. There are two types of accounts: personal accounts for individual users, and functional accounts, which potentially may be accessed by several users, coordinated by the owner of the account, who is resonsible for managing access by individual users.
Access to files is protected by standard UNIX permissions, with optional ACLs on some file systems. For new accounts, access permissions of the $HOME and $SCRATCH directories are being set to user-only access (drwx------). Membership of default UNIX groups encompasses the entire organisational unit (e.g. Institute). If needed, bespoke UNIX groups can be created by the ZID on demand to allow coordinators of work groups to manage user access on a more fine-grained level.
On a technical level, the degree of security protection of our HPC systems is best described as roughly corresponding to Level C1 in the now-obsolete TCSEC Orange Book. In particular, we offer no guarantees that our systems are suitable for processing sensitive data. Within this framework, users are solely responsible for taking adequate measures to ensure the safety and confidentiality of their data if necessary, e.g. by anonymization, encryption, or similar.
Software
At all UIBK HPC installations, a range of installed software is available to end users, including
- Linux operating system with typical pre-installed utilities
- Compilers (GNU, Intel, PGI) for C, C++, Fortran (77, 95, 20xx), including OpenMP parallelization
- Communication and parallelization libraries (e.g. OpenMPI)
- High performance computational and data libraries (Intel MKL implementation of Lapack and FFT, HDF, and many others)
- Development tools (parallel debugger, profiling)
- Integrated numerical productivity tools (e.g. Matlab, Mathematica, NumPy, SciPy, Pandas, R)
- Application Software (e.g. finite elements, fluid dynamics, computational chemistry, atmospheric sciences, statistics, visualisation...)
As human resources permit, software of general interest can be installed centrally or compiled by individual users.
Personal Support
A small but dedicated team ensures stable daily operation of the systems as well as controlled acquisition of new servers that meet the real demands of our user community.
In addition, we offer direct help to end users for problems arising in daily use of the systems:
- Individual introductory briefings for new users. Users can discuss their needs with HPC experts and get hints for optimal usage of the resources for their specific needs.
- Problem support. If problems arise that cannot be solved by a user, we can offer problem analysis and advice.
- Support for porting and optimizing programs. Code developed for other machines may not run well on a new machine, often for simple reasons. Often, experienced HPC experts can offer help to quickly resolve such problems.
Keeping it Simple
Our service spectrum is typical for classical small HPC teams. We service hundreds of scientists in a wide range of scientific fields, so we cannot become specialists in every application area. Consequently, our typical users have some degree of technical experience and know or learn how to put their programs into productive use.
In particular, we do not offer:
- Standardized services on a commercial service level.
- Integrated web application front ends, standard workflows for non-technical users.
- Specific application science level support (beyond general advice in selecting numerical methods or similar).
Although integrated workflows lower the entry threshold, they add to the technical complexity and maintenance effort of the systems. Knowledge about specific HPC appications is often shared in user communities existing in institutes or at large. The Research Area Scientific Computing also offers a platform for knowledge exchange. We encourage users to profit from this vast collective experience.