Instructions for setting up the Bug cluster (CMSC 714)

This document describes how to set up for the Bug cluster. If you need more information, look at the UMIACS cluster guide and the Bug cluster guide first.

  1. Logging in: Direct access to the cluster is limited to two submit nodes: brood00.umiacs.umd.edu and brood01.umiacs.umd.edu. Access to these is restricted to CS department/UMIACS machines only. If you are working at home, you should first use ssh to login to a department server (e.g., Linux or Solaris) using your department account. Refer to the department guide for details. Once you're logged in, then connect to the cluster with your class account (cs714xx).

  2. Generate SSH keys (first login only): All the cluster nodes need to be able to talk to each other from your account without using passwords, so you need to set up SSH keys the first time you log in. You can do it like so:
    $ ssh -l username brood00.umiacs.umd.edu
    $ cd $HOME
    $ ssh-keygen -t rsa1 -N "" -f $HOME/.ssh/identity
    $ ssh-keygen -t rsa  -N "" -f $HOME/.ssh/id_rsa
    $ ssh-keygen -t dsa  -N "" -f $HOME/.ssh/id_dsa
    $ cd .ssh
    $ touch authorized_keys authorized_keys2
    $ cat identity.pub >> authorized_keys
    $ cat id_rsa.pub id_dsa.pub >> authorized_keys2
    $ chmod 640 authorized_keys authorized_keys2
    

  3. Setting up your environment (first login only): This class will be using a non-default version of LAM (v7.1.4). To do this you need to add some directories to the beginning of your search path. Add the following lines to your .tcshrc (create this file in your home directory is it doesn't exist):
    setenv PATH "/usr/local/stow/lam-7.1.4-gm/bin:${PATH}"
    setenv LD_LIBRARY_PATH "/usr/local/stow/lam-7.1.4-gm/lib:${HOME}"
    
    We are adding your home directory to your library path because we need to create a link that renames a library file. From your home directory, run:
    $ ln -s /opt/UMtorque/lib/libtorque.so.2.0.0 libtorque.so.0
    
    You should see a file named libtorque.so.0 in your home directory. Log out and log back in. Make sure your shell is now finding the correct binaries; if the output of the which command isn't the same as below, you've done something in this step wrong.
    $ which mpicc
    /usr/local/stow/lam-7.1.4-gm/bin/mpicc
    

  4. Compiling programs: For MPI programs, you compile with mpicc, which works much like cc or gcc:
    $ mpicc -o foo foo.c
    
    compiles foo.c into an executable called foo.

  5. Running programs: To run programs, you'll generally want to use a submit script (although you can run in interactive mode as described here). A submit script will look something like the following:
    #!/bin/tcsh
    #PBS -l walltime=00:02:00
    #PBS -l nodes=4
    cd ~/<YOUR EXECUTABLE DIRECTORY>
    mpiexec -machinefile ${PBS_NODEFILE} -ssi rpi gm C <YOUR EXECUTABLE AND PARAMETERS>
    
    The second and third line are directives to the batch system specifying that your program will require 2 minutes of wall-time on each of 4 nodes. To run this job on the cluster, type:
    $ qsub foo.sh
    NNNNN.queen.umiacs.umd.edu
    
    The NNNNN that get returned is your job number. You can check the status of all your running jobs with the qstat command, and you can delete a job with qdel NNNNN. The standard output and standard error streams of your job are redirected to files named foo.sh.oNNNNN and foo.sh.eNNNNN, respectively. There will be a couple extra lines in the former file which read:
    Warning: no access to tty (Bad file descriptor).
    Thus no job control in this shell.
    
    This will always show up at the beginning of the output file; it's just part of the way the system runs. On a normal execution the error file will be empty.