next up previous
Next: Future Work Up: Conclusion Previous: Conclusion

Results

Our results have not been as accurate as we would have liked. The ping test was deterministic in finding the overhead of a send operation, and the times for pure computation match fairly well for some test cases. The communication turned out to be the most difficult to model. Wildly varying times for the input data sizes for Matrix Multiply and Array on the SP-2 can possibly be explained in several ways. First, the local SP-2 in question may not be performing reliably. Considering the frequent down times, this may have been a factor. The choice was made to use the local SP-2 here at Maryland, instead of one of the remote super computing facilities like Argonne National Laboratory. More likely is second, the interconnection network model we have implemented in Proteus is not tuned correctly. Additional testing and tweaking is necessary to get the simulator to accurately predict the running time on the SP-2. Other effects that can contribute to the perturbing of our results, is the differences in FPU operations on the SP-2 being handled by multiple fast FPUs from that of the mips based workstation the simulator ran on. Many of the details about the low level interconnection network are not widely available, and most likely are considered proprietary information to IBM. This makes discovering the true behavior by trial and error the only recourse, provided orthogonal test cases can be designed that isolate each of the factors that contribute to the overall execution time.

Another observation is related to the limitations of using the simulator on a limited memory workstation. The upper range for Matrix Multiply input size was 64x64 matrices. The SP-2 easily handled the 8192x8192 case. The scalability of the SP-2 may only be seen after sufficiently large test cases, whereas the simulator cannot approach a comparable size.

In any case, our network module is not finished to the point where it can be accurately used to fulfill the project goals.

Running applications on an accurate, robust simulator can provide many advantages to running on the real target hardware. Many internals are easily accessible, such as detailed communication information about the interconnection network. On a real parallel machine, such information may not be available. Other advantages include simpler ``idea to implementation'' paths, allowing much more productive experimentation as discussed in the introduction.



next up previous
Next: Future Work Up: Conclusion Previous: Conclusion



Generated by latex2html-95.1
Thu Jun 1 21:05:27 EDT 1995