CMSC 714- High Performance Computing
Fall 2005 - Programming Assignments 1 and 2
Due October 11 and October 20, 2005 @ 6:00PM
The purpose of this programming assignment is to gain experience in parallel programming on an SMP and a cluster, using OpenMP and MPI, respectively. For this assignment you are to write a sequential and two parallel implementations of a program to simulate a game called Sharks and Fishes.
The game is a variation of a simple cellular automata. The game is played on a square grid containing cells. At the start, some of the cells are occupied, the rest are empty. The game consists of constructing successive generations of the grid. To understand how the game works, imagine the ocean is divided into a square 2D-grid, where:
Rules for Fish
Rules for Sharks
More on Rules for Fish and Sharks
The specification above is not specific about when ages for starving or breeding are incremented, and whether starving or breeding takes precedence. The right way to think about it is that an animal's age is incremented between generations (and a generation consists of a red and a black sub-generation, as described in the next section). So if a shark reaches its starvation age in a generation, it dies, since the starvation age did not get reset by the end of the previous generation. As for breeding, an animal is eligible to breed in the generation after its current breeding age reaches the animal's minimum breeding age (supplied on the command line).
Traversal Order Independence
Since we want the order that the individual cells are processed to not matter in how the game evolves, you should implement the game using a so-called red-black scheme (as in a checkerboard) for updating the board for each generation. That means that a generation consists of 2 sub-generations. In the first sub-generation, only the red cells are processed, and in the second sub-generation the black cells are processed. In an even numbered row red cells are the ones with an even column number, and in an odd numbered row red cells are the ones with an odd column number. The red-black scheme allows you to think of each sub-generation as a separate parallel (forall) loop over the red (or black) cells, with no dependences between the iterations of the loop. Note that in the red-black scheme a fish or shark may end up moving twice in a generation. The rules that follow about selecting cells and resolving conflicts apply for each sub-generation.
Rules for Selecting a cell when multiple choices are possible
Conflicts
For this project the game grid has finite size. The x-axis and y-axis both start at 0 (in the upper left corner of the grid) and end at limit-1 (supplied on the command line).
Write a serial implementation of the above program in C.
Name the source file sharks-fishes-serial.c.
Input
x y type
1 3 fish
3 5 shark
Your program should take six command line arguments: the size of the grid (limit), the name of the data file, the shark and fish breeding ages, the shark starvation age, and the number of generations to iterate.
To be more specific, the command line of your program (e.g., for a sequential version) should be:
sharks-fishes <limit> <input file name> <shark breeding age> <fish breeding age> <shark starvation age> <# of generations>
A sample set of parameters for the command line and an input data file will be made available soon.
Output
At the end of program output a list of positions of sharks and fishes in the same file format as for the input data file.
A sample output file for the sample input data will also be made available soon.
Data structures
A 2-D grid of cells
struct ocean{
int type /* shark or fish or empty */
struct swimmer* occupier;
} ocean[MAX][MAX];
A linked list of swimmers
struct swimmer{
int type;
int x,y;
int age;
int last_ate;
int iteration;
swimmer* prev;
swimmer* next;
} *List;
At a high level, the logic of the serial (non-parallel) program is:
Write an OpenMP implementation of the Sharks and Fishes program as in the Part 1, with the same rules. Name this source code sharks-fishes-omp.c . There are now some complications to consider:
Conflicts
Load Balancing
Write an MPI implementation of the Sharks and Fishes program as for OpenMP,
and deal with the same problems.
Name this source code sharks-fishes-mpi.c
.
The goal is not to write the most efficient implementations, but rather to learn parallel programming with OpenMP and MPI.
Figure out how you will decompose the problem for parallel execution. Remember that OpenMP synchronization between threads must be done carefully to avoid performance bottlenecks, and that MPI (at least the mpich implementation) does not have great communication performance so you will want to make message passing infrequent. Also, you will need to be concerned about load balancing.
You must eventually submit the sequential and both parallel versions of your program (please use file names that make it obvious which files correspond to which version, as described above) and the times to run the parallel versions on input data to be give later, for 1, 2, 4 and 8 processes).
You also must submit a short report about the results (1-2 pages) that explains:
You will turn in the serial version and either the OpenMP or MPI parallel version at the first due date, with the short report, and then the serial version again (hopefully the same) and the other parallel version at the second due date, with an updated report. I will send email to each of you, telling you which parallel version you should turn in first. Please do the parallel versions in the order that you are assigned them, to aid the software engineering study being done on you during the programming assignments.
RUNNING OpenMP on tau/ceti
To run with OpenMP, you need to add the Sun compiler to your path (typically done in your .cshrc file):
set path=(/opt/SUNWhpc/bin $path)
The Sun C compiler that understands OpenMP is then invoked with mpcc (if you
really want to use C++, use mpCC instead).
To compile your OpenMP program, use the compiler flags "-xO3 -xopenmp=parallel"
(e.g., mpcc -xO3 -xopenmp=parallel your_file.c).
Note that OpenMP compiler optimization is level 3 whether you explicitly set it
or not, and you will get a compiler warning you if you don't use -xO3.
To choose the number of threads that OpenMP will use at runtime, set the
OMP_NUM_THREADS environment variable before running your program. This
means that you don't have to recompile to change the number of threads.
For example, to run the OpenMP hello world example from
http://www.llnl.gov/computing/tutorials/openMP/exercise.html):
$ mpcc -xO3 -xopenmp=parallel omp_hello.c -o omp_hello
$ setenv OMP_NUM_THREADS 8
$ ./omp_hello
Hello World from thread = 0
Number of threads = 8
Hello World from thread = 4
Hello World from thread = 7
Hello World from thread = 1
Hello World from thread = 3
Hello World from thread = 2
Hello World from thread = 6
Hello World from thread = 5
Note that because the default compiler optimization level is 3 for OpenMP
codes, if you compile your sequential code without optimization level 3, you may
see a performance boost that's due to compiler optimization, not parallelism.
So please also compile your serial code with -xO3 to show your performance
results
RUNNING MPI with MPICH
To run MPI, you need to set a few environment variables:
setenv MPI_ROOT /usr/local/stow/mpich-1.2.7
setenv MPI_LIB
$MPI_ROOT/lib
setenv MPI_INC
$MPI_ROOT/include
setenv MPI_BIN $MPI_ROOT/bin
# add MPICH commands to your path (includes mpirun
and the C compiler that links in the MPI library, mpicc)
set path=($MPI_BIN $path)
# add MPICH man pages to your manpath
if ( $?MANPATH ) then
setenv
MANPATH $MPI_ROOT/man:$MANPATH
else
setenv
MANPATH $MPI_ROOT/man
endif
The number of processes/processors the program will run with is specified as part of the mpirun command with the –np switch (you will instead probably use the pbsmpich command to run the MPI job with the Linux cluster scheduler - see the cluster documentation for more details).
Software Engineering Study
The instructions for setting up your accounts for the high performance computing software engineering study discussed in class are here .
Templates, sample code, makefiles, etc. are available here .
ADDITIONAL RESOURCES
For additional OpenMP information, see http://www.openmp.org (OpenMP API specification, and a lot more). For more on the Sun OpenMP implementation, see http://docs.sun.com/app/docs/doc/819-0501 .
For additional MPI information, see http://www.mpi-forum.org (MPI API) and http://www-unix.mcs.anl.gov/mpi (for MPICH).
For more information about using the Maryland cluster PBS scheduler, MPI, etc., see http://www.umiacs.umd.edu/research/parallel/classguide.htm . The scheduler has recently been updated with a new frontend with the same set of commands as PBS, so you need to add the path to the new scheduler commands at the front of your path:
set path = (/opt/UMtorque/bin $path)
Last updated Saturday, 15 October 2005 10:53 PM