The purpose of this programming assignment is to gain experience in writing GPU kernels.
The game of life simulates simple cellular automata. The game is played on a rectangular board containing cells. At the start, some of the cells are occupied, the rest are empty. The game consists of constructing successive generations of the board. The rules for constructing the next generation from the previous one are:
We provide you with CUDA starter code to handle reading/writing data and copying it between the CPU and GPU. Your job is to write the GPU kernel compute_on_gpu, which computes a single iteration of Game of Life. The starter code contains the function compute, which is a serial CPU implementation of what you are going to implement.
The GPU kernel is structured as follows:
__global__ void compute_on_gpu(int *life, int *previous_life, int X_limit, int Y_limit) {
/* your code here */
}
life is the board you are writing into, previous_life are the values of the board at the previous iteration, and X_limit & Y_limit are the board size.
compute_on_gpu will be called with block size 16x16 and grid size ⌈X_limit/16⌉x⌈Y_limit/16⌉.
You can change the block size with the blockDimSize variable.
Your kernel should be able to handle when there are more cells than threads available on the GPU (i.e. you need to implement striding).
Your program should read in a file containing the coordinates of the initial cells. Sample files are located here: life.1.256x256.data and life.2.256x256.data (256x256 board). Each line in this file represents the coordinates of a cell on the board that is live. For instance, the following entry:
1,3
means that the cell at position [1, 3] is live in the initial state. You can
also find many other sample patterns on the web (use your favorite search
engine on "game of life" and/or "Conway").
Your program should take five command line arguments: the name of the data and output file, the number of generations to iterate, X_limit, and Y_limit. To be more specific, the command line of your program should be:
./life <input file name> <# of generations> <X_limit> <Y_limit> <output file name>
To get a GPU for your sbatch job add the following setting to the script
#SBATCH -p gpu
#SBATCH --gres=gpu:a100_1g.5gb
If you use interactive jobs, then add the -p gpu --gres=gpu:a100_1g.5gb
arguments when you run salloc.
Your program should write <output file name> that contains comma separated values representing the board. There should be one line (containing the x coordinate, a comma, and then the y coordinate) for each occupied cell at the end of the last iteration.
Sample output files are available:
The only printing from your program to standard output should be the GPU timing (up to five decimal places) that looks like this:
TIME: 4.529 s
Three input files that will be used for testing correctness are available via these links:
life.22x22.data
life.32x32.data
final.512x512.data.
You must submit the following files and no other files:
game-of-life.cu
: starter code with compute_on_gpu implemented.
Makefile
that will compile your code successfully on zaratan when using nvcc. You can see a sample Makefile here. Make sure that the executable name is game-of-life, and do not include the executables in the tarball.
LastName-FirstName-report.pdf
) with performance results (one line plot).
The line plot should present the execution times to run the GPU version respectively on the input file final.512x512.data (for different block sizes).
You can change the block size by changing the blockDimSize variable.
In the report, you should include:
LastName-FirstName-assign4
), compress it to a .tar.gz file
(LastName-FirstName-assign4.tar.gz
) and upload that file to gradescope.
If you want to try a bigger board try running on the input file life.1024x1024.data.
The project will be graded as follows:
Component | Percentage |
---|---|
Runs correctly with 22x22 board | 30 |
Runs correctly with 32x32 board | 30 |
Runs correctly with 512x512 board | 30 |
Writeup | 10 |