Concise User's Guide to the Statistics Cluster

By: Igor Senderovich (work in progress)

Cluster Overview

A cluster of 32 computers has been assembled to facilitate parallel computation in the field of Statistics. Each computer "node", Dell PowerEdge SC1435 features: The nodes are networked with a Dell PowerConnect 6248 gigabit switch. The unit is uplinked to an existing Physics Department computer infrastructure (with storage for software, shared temporary files etc.) via a 1 Gbps connection.

It is best to think of every core as a separate virtual machine capable of running one process. The cores remain isolated because the computer cannot distribute a simple, stand-alone process between its cores without special instructions for doing so within the code itself. We will therefore refer to cores or virtual machines (of which there are 8 x 32) instead of computers or processors as independent computing units, each having about 1 GB of RAM at its disposal when all cores are uniformly loaded.

Available software: GCC compiler package (4.1.1), PGI Fortran Compiler (7.2), IMSL (C: 7.0, Fortran: 6.0), MPICH (1.2.6), LAM-MPI (7.1.14), Matlab (7.3 - R2006b), ROOT (5.16), Condor (6.8.3)

Cluster Access

The 32 computers that make up the cluster are labeled stat0-stat31. stat31 serves as the login and compilation node (sole licensed location of the PGI compiler) and is accessible from the outside by the name via an ssh client. Such client is installed on most linux and Mac OS X machines is invoked from the terminal as follows:
ssh username@servername
Free ssh clients for Windows such as PuTTY are also available. An account holder on the cluster will usually have a user name composed of the first letter(s) of his or her first name and full last name (John Graunt: jgraunt, Rainer Maria Rilke: rmrilke). Note that the password assigned to you is case-sensitive. Changing of the password cannot be done on the stat nodes themselves: log into and use the command kpasswd. You will be prompted for your current password and then asked to create a new one.

A secure file transfer partner to ssh is scp. This simple tool has a similar syntax to the file copy cp command on linux, but allows remote transfer, compression, encryption ciphers etc. For example, transfer of a local file metropolis_v1.c to the src directory under the home space with a new name - metropolis.c is invoked as follows:
scp metropolis_v1.c
Several graphical scp clients (e.g. WinSCP) with explorer-like directory browsing are available for Windows
Graphics, if necessary, are best tunnelled through the ssh link via the VNC service.

Disk Space Organization

Upon logging in, the user lands in his or her home directory: /home/username. This modest amount of space (not to exceed about 50 MB) is intended as private space for development, testing of code etc. It is not intended for large data files or as the launching point for parallel jobs.

For other purposes a temporary directory /scratch is available from all nodes. This is a larger working space suitable for collaborative work and launching of jobs. Big files that need to be transfered to the cluster can be copied directly to the scratch space by specifying /scratch as a destination in your scp client. For example, compression-enabled transfer of a large file directly to a working directory under the cluster's scratch space with the console-based scp client looks as follows:
scp -C WarAndPeace.txt
Content stored here should be readable to all - necessary to give the job scheduler access for distribution to other nodes. Being a collaborative space, however, means that it should be kept organized for the sake of all other clusters users (including members of the adjoining Physics and Geophysics clusters). This space may be cleaned up by administrators if any files appear to be abandoned.

The shared spaces discussed so far reside on a file server and accessible to the stats nodes over the network via nfs. While this is convenient for the purposes of a shared space for testing and job distribution, network latency, bandwidth limitation and congestion may create a bottleneck for data-intensive calculations. To resolve this problem, local space is available on each node in the form of an on-board hard disk. It is mounted on each node under /local. Note that use of this space requires a job to copy the necessary files and clean up at the end.

Job Submission

The Statistics Cluster is equipped with a powerful job queuing system called Condor. This framework provides efficient use of resources by matching user needs to the available resources by taking into account both the priorities for the hardware and the preferences of the job. Matching resource requests to resource offers is accomplished through the ClassAds mechanism. Each virtual machine publishes its parameters as a kind of classified advertisement to attract jobs. A job submitted to Condor for scheduling may list its requirements and preferences. However, this feature may prove rarely necessary given how uniformly the cluster has been equipped in terms of hardware and software. Jobs are submitted with the condor_submit command with a job description file passed as an argument. A simple description file goes as follows:
Executable = myprog
Requirements = ParallelSchedulingGroup == "stats group"
Universe  = vanilla
output    = myprog$(Process).out
error     = myprog$(Process).err
Log       = myprog$(Process).log
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
Queue 50
Most of the variables are self-explanatory. The executable is a path to the program binary. The output, error and log create the respective records for each job numbered by Condor with the $(Process) variable. The requirements variable is important to constrain job assignment to Statistics Cluster nodes only. All available nodes are tagged with ParallelSchedulingGroup variable in the ClassAds, so this is an effective way to direct execution to particular cluster segments. Physics and Geophysics nodes are also available but they are much older than the statistics nodes and may not contain all the necessary libraries. A detailed example of a job is available here.

Compilation and using IMSL

The cluster is equipped with IMSL (International Mathematics and Statistics Library) for C/C++ and Fortran. Note that due to license restrictions, only the Statistics Cluster nodes (labeled stat0 through stat31 as opposed to physics and geophysics computers) have access to these libraries. The libraries are filed away under /usr/local/vni/imsl according to language, version (where applicable) and architecture. C/C++ libraries are labeled with the prefix CNL, those of Fortran are labeled FNL. Under normal setup, all library paths are recorded in your environment variables of the form LINK_CNL... or LINK_FNL... Likewise convenient compiler and compilation arguments are stored in environment variables on stat31 – your login and compilation node. Thus, compilation of fortran program may look as simple as:
$F90 srcname.f $F90FLAGS $LINK_FNL -o binname
Fortran compiler flags are also available under $FFLAGS (referring to "fixed-format" source files.) Corresponding flags for C/C++ compilation are $CC and $CFLAGS. You can review all the available environment with the set command (setenv for c-shell relatives). This long list can be filtered as follows:
set |grep CNL
(Replace set for setenv for c-shell and CNL for FNL or any other filter string as necessary.) As always, custom environment variables may be defined and made a permanent part of the login shell by adjusting the "rc" file appropriate the login shell used. For an example of using IMSL libraries, consult the Fortran/IMSL-based example - Monte Carlo Calculation of π



Recommended Software

See Also