Concise User's Guide to the Statistics Cluster
By: Igor Senderovich
(work in progress
A cluster of 32 computers has been assembled to facilitate parallel computation in the field of Statistics. Each computer "node", Dell PowerEdge SC1435 features:
- 2 x 4-core AMD Opteron 2350 processors (2 GHz)
- 8 GB of Memory (667 MHz)
- 250 GB hard drive (SATA, 7.2k RPM, 3 Gbps)
- dual port 1 Gbps network adapter
The nodes are networked with a Dell PowerConnect 6248 gigabit switch.
The unit is uplinked to an existing Physics Department computer infrastructure
(with storage for software, shared temporary files etc.) via a 1 Gbps connection.
It is best to think of every core as a separate virtual machine
capable of running one process.
The cores remain isolated because the computer cannot distribute a simple, stand-alone
process between its cores without special instructions for doing so within the code itself.
We will therefore refer to cores or virtual machines (of which there are 8 x 32)
instead of computers or processors as independent computing units, each
having about 1 GB of RAM at its disposal when all cores are uniformly loaded.
GCC compiler package (4.1.1), PGI Fortran Compiler (7.2),
(C: 7.0, Fortran: 6.0),
(7.3 - R2006b),
The 32 computers that make up the cluster are labeled stat0-stat31.
stat31 serves as the login and compilation node (sole licensed location of the PGI compiler)
and is accessible from the outside
by the name
via an ssh client. Such client is installed on most linux
and Mac OS X machines is invoked from the terminal as follows:
Free ssh clients for Windows such as
are also available.
An account holder on the cluster will usually have a user name composed of
the first letter(s) of his or her first name and full last name
, Rainer Maria Rilke:
). Note that the password
assigned to you is case-sensitive. Changing of the password cannot be done on the stat nodes themselves:
and use the command
You will be prompted for your current password and then asked to create a new one.
A secure file transfer partner to ssh is scp
. This simple tool has a similar syntax to the
command on linux, but allows remote transfer, compression,
encryption ciphers etc. For example, transfer of a local file metropolis_v1.c
to the src directory under the home space with a new name - metropolis.c is invoked as follows:
scp metropolis_v1.c email@example.com:src/metropolis.c
Several graphical scp clients (e.g. WinSCP
with explorer-like directory browsing are available for Windows
Graphics, if necessary, are best tunnelled
through the ssh link via the VNC service.
Disk Space Organization
Upon logging in, the user lands in his or her home directory:
This modest amount of space (not to exceed about 50 MB)
is intended as private space for development, testing of code etc.
It is not intended for large data files or as the launching point for parallel jobs.
For other purposes a temporary directory
is available from all nodes.
This is a larger working space suitable for collaborative work and launching of jobs. Big files that
need to be transfered to the cluster can be copied directly to the scratch space by specifying
as a destination in your scp client. For example, compression-enabled transfer
of a large file directly to a working directory under the cluster's scratch space with the console-based scp client looks as follows:
scp -C WarAndPeace.txt firstname.lastname@example.org:/scratch/LeosSpace/novel_parse
Content stored here should be readable to all -
necessary to give the job scheduler access for distribution to other
nodes. Being a collaborative space, however, means that it should be kept organized for the sake of
all other clusters users (including members of the adjoining Physics and Geophysics clusters). This space
may be cleaned up by administrators if any files appear to be abandoned.
The shared spaces discussed so far reside on a file server and accessible to the stats nodes over the
network via nfs
. While this is convenient for the purposes of a shared space for testing and
job distribution, network latency, bandwidth limitation and congestion may create a bottleneck for
data-intensive calculations. To resolve this problem, local space is available on each node in the form
of an on-board hard disk. It is mounted on each node under
. Note that use of this
space requires a job to copy the necessary files and clean up at the end.
The Statistics Cluster is equipped with a powerful job queuing system called Condor
. This framework provides efficient use of resources by matching user needs to the available resources by taking into account both the priorities for the hardware and the preferences of the job. Matching resource requests to resource offers is accomplished through the ClassAds mechanism. Each virtual machine publishes its parameters as a kind of class
vertisement to attract jobs. A job submitted to Condor for scheduling may list its requirements and preferences. However, this feature may prove rarely necessary given how uniformly the cluster has been equipped in terms of hardware and software.
Jobs are submitted with the condor_submit
command with a job description file passed as an argument. A simple description file goes as follows:
Executable = myprog
Requirements = ParallelSchedulingGroup == "stats group"
Universe = vanilla
output = myprog$(Process).out
error = myprog$(Process).err
Log = myprog$(Process).log
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
Most of the variables are self-explanatory. The executable is a path to the program binary.
The output, error and log create the respective records for each job numbered by
Condor with the
The requirements variable is important to
constrain job assignment to Statistics Cluster nodes only. All available nodes are tagged with
ParallelSchedulingGroup variable in the ClassAds, so this is an effective way to direct
execution to particular cluster segments. Physics and Geophysics nodes are also available
but they are much older than the statistics nodes and may not contain all the necessary libraries.
A detailed example
of a job
is available here
Compilation and using IMSL
The cluster is equipped with IMSL (International Mathematics and Statistics Library)
for C/C++ and Fortran. Note that due to license restrictions, only the Statistics Cluster nodes
(labeled stat0 through stat31 as opposed to physics and geophysics computers)
have access to these libraries. The libraries are filed away under
according to language, version (where applicable) and architecture. C/C++ libraries are labeled
with the prefix CNL, those of Fortran are labeled FNL. Under normal setup, all library paths are
recorded in your environment variables of the form
Likewise convenient compiler and compilation arguments are stored in environment variables on
stat31 – your login and compilation node.
Thus, compilation of fortran program may look as simple as:
$F90 srcname.f $F90FLAGS $LINK_FNL -o binname
Fortran compiler flags are also available under
(referring to "fixed-format" source files.)
Corresponding flags for C/C++ compilation are
You can review all the available environment with the
for c-shell relatives). This long list can be filtered as follows:
set |grep CNL
(Replace set for setenv for c-shell and CNL for FNL or any other
filter string as necessary.)
As always, custom environment variables may be defined and made a permanent part of the
login shell by adjusting the "rc" file appropriate the login shell used.
For an example of using IMSL libraries, consult the Fortran/IMSL-based example - Monte Carlo Calculation of π
- Terminal Application (SSH Client): PuTTY
- Secure file transfer client: WinSCP
- VNC - graphical remote-control: UltraVNC, RealVNC
- Linux-like environment for Windows: Cygwin (includes GNU compilers, editors, LaTeX, X Window system etc.)