CMSC 714- High Performance Computing

Fall 2018 - MPI Programming Assignment

Due Friday, September 21, 2018 @ 6:00PM

The purpose of this programming assignment is to gain experience in parallel programming on a cluster and MPI. For this assignment you are to write a parallel implementation of a program to simulate the game of Life.

The game of life simulates simple cellular automata. The game is played on a rectangular board containing cells. At the start, some of the cells are occupied, the rest are empty. The game consists of constructing successive generations of the board. The rules for constructing the next generation from the previous one are:

    1. death: cells with 0,1,4,5,6,7, or 8 neighbors die (0,1 of loneliness and 4-8 of over population)
    2. survival: cells with 2 or 3 neighbors survive to the next generation.
    3. birth: an unoccupied cell with 3 neighbors becomes occupied in the next generation.

For this project the game board has finite size. The x-axis starts at 0 and ends at X_limit-1 (supplied on the command line). Likewise, the y-axis start at 0 and ends at Y_limit-1 (supplied on the command line).

INPUT

Your program should read in a file containing the coordinates of the initial cells. Sample files are located here and here. You can also find many other sample patterns on the web (use your favorite search engine on "game of life" and/or "Conway").

Your program should take four command line arguments: the name of the data file, the number of generations to iterate, X_limit, and Y_limit.

To be more specific, the command line of your program should be:

life <input file name> <# of generations> <X_limit> <Y_limit>

The number of processes the program will run on is specified as part of the mpirun command with the -np switch.

OUTPUT

Your program should print out one line (containing the x coordinate, a space, and then the y coordinate) for each occupied cell at the end of the last iteration. The output should go to standard output, and no additional output should be sent to standard output.

Sample output files are available:

life.data.1.100.250.250.out is the output of the file life.data.1 run for 100 generations on a 250x250 board

life.data.2.100.250.250.out is the output of the file life.data.2 run for 100 generations on a 250x250 board

The files are also available on deepthought2 in /homes/asussman/public/714/MPI/data .

 

HINTS

The goal is not to write the most efficient implementation of Life, but rather to learn parallel programming with MPI.

Figure out how you will decompose the problem for parallel execution. Remember that MPI (at least the OpenMPI implementation) does not always have great communication performance and so you will want to make message passing infrequent. Also, you will need to be concerned about load balancing.  To learn about decomposing the problem in different ways, you must generate two parallel versions of the program, one that uses a 1D decomposition (rows or columns) and one that uses a 2D decomposition (both rows and columns).

Once you have decided how to decompose the problem, write the sequential version first.

WHAT TO TURN IN

You must submit the sequential and both parallel versions of your program (please use file names that make it obvious which files correspond to which version) and the times to run the parallel versions on the input file final.data (for 1, 2, 4, 8, 16 and 32 processes), running on a 500x500 board for 500 iterations. Since the cluster you run on has 20 cores/processors per node, you must also time running on different numbers of nodes (for a fixed number of processes, more nodes means fewer processors utilized per node) to see the performance effects. In total, for 32 processes, you must run at least 4 configurations, with different numbers of nodes and numbers of processors per node (at most 4 nodes, so one configuration could be, for example, 2 nodes with 16 processors per node).

You also must submit a short report about the results (1-2 pages) that explains:

If you want to try a bigger board, to see if you can get better speedups with more processes, try running on the input file life.data.1000x1000.

GRADING

The project will be graded as follows:

Item

Pct

Correctly runs with 1 process

15 %

Correctly runs with 32 processes

40% (20% each decomposition)

Performance with 1 process

10%

Speedup of parallel versions

20% (10% each decomposition)

Writeup

15%

RUNNING MPI with OpenMPI on the deepthought2 cluster

Information on how to submit jobs and run MPI programs with OpenMPI on the cluster is available here and here . A sample script to submit an MPI job using sbatch is here.

Additional information on using the cluster is available from the Usage Docs tab on the top of those pages.

The number of processes/processors your program will run with is specified as part of the mpirun command with the -np switch.

To get the OpenMPI compile and run commands, mpicc and mpirun, put the line module load openmpi/gnu in the .cshrc.mine file in your home directory on deepthought2.

ADDITIONAL RESOURCES

For additional MPI information, see http://www.mpi-forum.org (MPI API) and http://www.open-mpi.org (for OpenMPI).

For basic information about the deepthought2 cluster, the SLURM cluster scheduler, MPI, etc., start from the Deepthought2 cluster home page.