High Performance Computing Systems (CMSC714)

Assignment 3: Charm++

Due: Tuesday April 6, 2021 @ 11:59 PM Anywhere on Earth (AoE)

The purpose of this programming assignment is to gain experience in parallel programming on a cluster and Charm++. For this assignment, you have to write a parallel implementation of a program to simulate the Game of Life.

The game of life simulates simple cellular automata. The game is played on a rectangular board containing cells. At the start, some of the cells are occupied, the rest are empty. The game consists of constructing successive generations of the board. The rules for constructing the next generation from the previous one are:

  1. death: cells with 0,1,4,5,6,7, or 8 neighbors die (0,1 of loneliness and 4-8 of over population)
  2. survival: cells with 2 or 3 neighbors survive to the next generation.
  3. birth: an unoccupied cell with 3 neighbors becomes occupied in the next generation.
For this project the game board has finite size. The x-axis starts at 0 and ends at X_limit-1 (supplied on the command line). Likewise, the y-axis start at 0 and ends at Y_limit-1 (supplied on the command line).

Input

Your program should read in a file containing the coordinates of the initial cells. Sample files are located here: life.1.250x250.data and life.2.250x250.data (250x250 board). You can also find many other sample patterns on the web (use your favorite search engine on "game of life" and/or "Conway").

Your program should take six command line arguments: the name of the data file, the number of generations to iterate, X_limit, Y_limit, and the X and Y sizes of the 2D chare array, num_chares_X, and num_chares_Y. To be more specific, the command line of your program should be:


        ./life <input file name> <# of generations> <X_limit> <Y_limit> <num_chares_X> <num_chares_Y>
        

The number of processes the program will run on is specified as part of the mpirun command with the -np argument.


        mpirun -np <number of processes> ./life <input file name> <# of generations> <X_limit> <Y_limit> <num_chares_X> <num_chares_Y>
        

Output

Your program should write a single file called <input-file-name>.<no-of-generations>.csv (from one chare, possibly the main chare) that contains comma separated values representing the board. There should be one line (containing the x coordinate, a comma, and then the y coordinate) for each occupied cell at the end of the last iteration. Sample output files are available:

Similar to the output files above, your output in the file should be sorted by the X and Y coordinates. The only print from your program to standard output should be from the main chare that looks like this:

      TIME: Min: 25.389 s Avg: 27.452 s Max: 41.672 s
      
where Min, Avg and Max time (in seconds) are calculated using reduction operations (contribute) over the individual time measurements of the "main" loop (over generations) on different chares for the sample final.500x500.data input file.

Suggestions

You should use a 2D decomposition (both rows and columns) of the problem into chares.

What to Submit

You must submit the following files and no other files:

  • life2d.[ci,h,C]: parallel version with 2D decomposition
  • Makefile that will compile your code successfully on deepthought2 when using charmc. You can see a sample Makefile here. Make sure that the executable name is life2d and do not include the executable in the tarball.
  • You must also submit a short report (pdf) with performance results (a line plot). The line plot should present the execution times to run the parallel versions on the input file final.500x500.data (for 1, 2, 4, 8, and 16 processes), running on a 500x500 board for 500 iterations. In the report, you should mention:
    • how you did the decomposition
    • how was the initial data distribution done
    • what are the performance results, and are they what you expected
You should put the code, Makefile and report in a single directory (named LastName-assign1), compress it to .tar.gz (LastName-assign1.tar.gz) and upload that to ELMS.

If you want to try a bigger board, to see if you can get better speedups with more processes, try running on the input file life.1000x1000.data.

Grading

The project will be graded as follows:

Component Percentage
Runs correctly with 1 process, 2x2 chares 40
Runs correctly with 16 processes, 8x8 chares 40
Writeup 20