Introduction to Parallel Computing (CMSC416)

 

Assignment 3: OpenMP

Due: Wednesday April 12, 2023 @ 11:59 PM Eastern Time

The purpose of this programming assignment is to gain experience in parallel programming on a cluster with OpenMP. You will start with a working serial program (quake.c) that models an earthquake, analyze its performance and then add OpenMP directives to create a parallel program.

The goal is to be systematic in figuring out how to parallelize this program. You can use one of three things: manually inserted timers, gprof, or HPCToolkit and/or Hatchet (that you used in Assignment 2) to figure out what parts of the program take the most time. From there you should examine the loops in the most important subroutines and figure out how to add OpenMP directives. The program will be run on a single compute node of zaratan.

Using OpenMP

To compile OpenMP we will be using gcc version 9.4.0 (the default version on zaratan, which you can get by doing module load gcc on the zaratan login node), which nicely has OpenMP support built in. In general, you can compile this assignment with:


        gcc -fopenmp -O2 -o quake quake.c -lm
        

The -fopenmp tells the compiler to, you guessed it, recognize OpenMP directives. -lm is required because our program uses the math library.

The environment variable OMP_NUM_THREADS sets the number of threads (and presumably cores) that will run the program. Set the value of this environment variable in the script you submit the job from. It defaults to using all available cores, and on a zaratan node that means 128 (and you might not want to do that).

In addition to setting the number of threads your program will run with through the environment variable, we also suggest using the following commands in your batch script so that your program will perform with less variability from run to run.


        #SBATCH --cpus-per-task=N
	#SBATCH --mem-bind=local
	export OMP_PROCESSOR_BIND=true
	export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
        
The first line asks SLURM for N cores to run your OpenMP program (replace N with the number of threads you want to run with). This replaces the --ntasks switch you used for your MPI programs, since your OpenMP program is only 1 task, with multiple threads.
The second line asks SLURM to use cores that share access to the same memory banks on the node, if possible.
The third line asks SLURM to bind threads to specific cores, so that if they are swapped out to run something else (i.e. another thread from yours or another process) they will return to the same core if possible.
The fourth line sets the number of threads to the number of cores you requested in the first line, so you don't have to change both lines to change the number of cores and threads you want to run with, since they should always be the same.

Running the program

Quake reads its input file from standard input, and produces its output on standard output. Quake generates an output message periodically (every 30 of its simulation time steps), so you should be able to tell if it is making progress.

When the program runs correctly, all versions (serial or parallel) running the quake.in input, irrespective of the number of threads used, should output this at the 3840th timestep:
        Time step 3840
        5903: -3.98e+00 -4.62e+00 -6.76e+00
        16745: 2.45e-03 2.66e-02 -1.01e-01
        30169 nodes 151173 elems 3855 timesteps

This is the output for quake.in.short:
        Time step 30
        978: 8.01e-03 7.19e-03 8.41e-03
        3394: -3.69e-21 1.57e-20 -5.20e-20
        7294 nodes 35025 elems 34 timesteps

Since quake runs for a while on the quake.in input dataset for a small numbers of threads, quake.in.short is another input file that runs for much less time (you can use this for testing).

What to Submit

You must submit the following files and no other files:

You should put the code, Makefile and report in a single directory (named LastName-FirstName-assign3), compress it to a .tar.gz file (LastName-FirstName-assign3.tar.gz) and upload that file to gradescope.

    double start, end; 

start = omp_get_wtime();
... work to be timed ...
end = omp_get_wtime();

printf("TIME %.5f s\n", end - start);

Tips

Grading

The project will be graded as follows:

Component Percentage
Runs correctly on 4 threads 20
Runs correctly on 16 threads 40
Performance with 4 threads 20
Performance with 16 threads 10
Writeup 10
NOTE: If your program does not run correctly, you do NOT get any points for performance/speedup.

 

Web Accessibility