next up previous
Next: Conclusion Up: Experimental Results Previous: Master-Slave Matrix Multiply

Parallel Range Sort (PRS)

This application fits is more SPMD than the previous applications. In PRS, a total of N random keys (32 bit integers) are generated, keys per processor where P is the number of processors.

Each PE sends each of its (locally generated) keys k to , where r is the range. Upon completion of the distribution phase, each P sorts its (new) fragment.

As stated earlier, the main difference is there is no strict master-slave relationship; this is a pure SPMD application. Thus, the communication pattern is much more complex. It exhibits all-to-all communication very frequently.

  
Figure 6: Plot of PRS total execution times for N=4096

  
Figure 7: Plot of PRS sort times for N=4096

  
Figure 8: Plot of PRS communication times for N=4096

The results of running this application on Proteus and the SP-2 are shown in the graphs in Figures 6, 7 and 8. The graph in Figure 6 shows the overall execution time of PRS. There is some difference between the two curves that becomes more pronounced as the number of processors increase. This difference is due to the poor performance of the network module. The pure sorting and communication times for the application are graphed in Figures 7 and 8, respectively. It should be noted that the scaled time on the simulator fits the actual sorting time times well. But, the communication of the simulator with our network module performs very poorly, thus leading to the differences in overall performance. It does not seem to be capturing the activity in the network at all.



Generated by latex2html-95.1
Thu Jun 1 21:05:27 EDT 1995