CMSC416/CMSC818X - Introduction to Parallel Computing (Fall 2021)

Introduction to Parallel Computing (CMSC416/CMSC818X)

Assignment 5: Charm++

Due: Tuesday November 23, 2021 @ 11:59 PM Eastern Time

The purpose of this programming assignment is to gain experience in parallel programming on a cluster and Charm++. For this assignment you have to write a parallel implementation of the prefix sum algorithm.

Input/Initialization

Your program should read in a file containing the values that will be used to initialize the 1D array of integers. A sample file is available here. You can generate other sample input files using this Python code.

Your program should take three command line arguments: the name of the input data file, the number of chares, and the name of the output data file. To be more specific, the command line of your program should be:


      ./prefix <input filename> <# of chares> <output filename>

The number of processes the program will run on is specified as part of the mpirun command with the -np argument.


      mpirun -np <# of processes> ./prefix <input filename> <# of chares> <output filename>

Output

Your program should write a single file (name set using the command line parameter) that contains values of the 1D sequence after the prefix sum computation. Each line should contain one number. This is the correct output file for the sample input file above. The only print from your program to standard output should be from the main chare that looks like this:


      TIME: 41.672 s

where time is measured for the prefix sum calculation and excludes the time for file reading/writing. Make sure that you use "-O2" as a compiler flag for fast timings.

Parallel Version

You can use the parallel prefix sum algorithm discussed in the class, which is referred to as prefix sum with recursive doubling. You can assume that the number of numbers is much larger than the number of chares. The CI file will look something like this:


      mainmodule prefix {
        readonly CProxy_Main mainProxy;
        readonly int numChares;
        readonly CProxy_Prefix prefixArray;

        mainchare Main {
          entry Main(CkArgMsg∗);
          entry void done();
        };

        array [1D] Prefix {
          entry Prefix();
          entry void phase(int);
          entry void passValue(int phase, int value);
        };
      };

What to Submit

You must submit the following files and no other files:

prefix.ci, prefix.C, (optional header file): your parallel implementation
Makefile that will compile your code successfully on deepthought2 when using charmc. You can see a sample Makefile here. Make sure that the executable name is prefix and do not include the executable in the tarball. NOTE: assignments without a Makefile will not be graded. You can load the deepthought2 charm module using:
```
      module load charmpp
      
```
You must also submit a short report (LastName-FirstName-report.pdf) with performance results (a line plot). The line plot should present the execution times to run the parallel version on the sample input file (for 1, 2, 4, 8, and 16 cores), with 4 chares per PE (core).

You should put the code, Makefile and report in a single directory (named LastName-FirstName-assign5), compress it to .tar.gz (LastName-FirstName-assign5.tar.gz) and upload that to ELMS.

Deepthought2 primer
Use -g while debugging but -O2 when collecting performance numbers.

Grading

The project will be graded as follows:

Component	Percentage
Runs correctly on 1 process, 4 chares	20
Runs correctly on 16 processes, 64 chares	20
Runs correctly on 20 processes, 70 chares	30
Speedup on 16 processes, 64 chares	20
Writeup	10

NOTE: If your program does not run correctly, you do NOT get any points for performance/speedup.