Instructions for setting up the Hive cluster (CMSC 714)

This document describes how to set up for the Hive cluster, brood00.umiacs.umd.edu.

1. Generate SSH keys (everybody must do this once)

A direct access to the Hive cluster is restricted to CS department/UMIACS machines only. If you are working at home, you should first use ssh to login to a department server (e.g., Linux or Solaris) using your department account. Refer to the department guide for details. Once you're logged in, then connect to the cluster with your class account (cs714xx).

% ssh -l username brood00.umiacs.umd.edu
% cd $HOME
% ssh-keygen -t rsa1 -N ""  -f $HOME/.ssh/identity
% ssh-keygen -t rsa -N "" -f $HOME/.ssh/id_rsa
% ssh-keygen -t dsa -N "" -f $HOME/.ssh/id_dsa
% cd .ssh
% touch authorized_keys authorized_keys2
% cat identity.pub >> authorized_keys
% cat id_rsa.pub id_dsa.pub >> authorized_keys2
% chmod 640 authorized_keys authorized_keys2

You also need to manually set up your environment. Refer to the Hive cluster manual for path settings. In particular, if you use LAM, which we recommend, make sure that in your path, /usr/local/stow/lam-7.0.6-gm-nagware/bin, /opt/UMtorque/bin, and /usr/local/bin are placed in this order, and before /usr/bin.

3. Basic usage

Below is just an overview of the cluster usage. Refer to Hive cluster manual for more detailed information.

To compile a program

% mpicc -o foo foo.c

To run a program in a batch mode

You must write a script file (foo.sh) like below.

#!/bin/bash
#PBS -l nodes=2
lamboot $PBS_NODEFILE
cd ~/
mpirun -np 2 ./foo
lamhalt
then submit a job to a PBS queue.
% qsub -l nodes=<nodes> foo.sh

The nodes option is very important to make sure that your job is run properly. Typical options are "nodes=2,walltime=00:00:60", etc.

Note that the standard output and the standard error are redirected to files named foo.sh.oNNNNN and foo.sh.eNNNNN, respectively.

To run a program in an interactive mode

The following command submits an interactive job.

% qsub -l nodes=<nodes> -I

In an interactive shell, you must run "lamboot $PBS_NODEFILE" before running mpirun jobs. You should run "lamhalt" before you exit.

Other commands

  • "qdel": delete pbs batch job
  • "qdel": delete pbs batch job
  • "pbsnodes -a": show all nodes and their attributes