CMSC 714 (Fall 2015)

Tentative Reading List

Introduction

9/1 No Class

9/3 Parallel Computing and Parallel Computers

Lecture Notes

9/8 Applications of Parallel Computing

Lecture Notes

Programming Models

9/10 Expressing Parallelism (Explicit Control)

"The PVM Concurrent Computing System: Evolution, Experiences, and Trends", (PDF)

J. J. Dongarra, S. W. Otto, M. Snir, and D. Walker, "A message passing standard for MPP and workstations," CACM, 39(7), 1996, pp. 84-90. (PDF)

9/15 Expressing Parallelism (Implicit Control)

B. L. Chamberlain, "A Brief Overview of Chapel", pre-print Jan. 2013 (PDF)

L. Dagum and R. Menon, "OpenMP: An Industry-Standard API for Shared-Memory Programming," IEEE Computational Science & Engineering, 5(1), 1998, pp. 46-55. (PDF)

9/17 Expressing Parallelism (Hybrids)

Steve W. Bova et. al., "Parallel Programming with Message Passing and Directives", Computing in Science & Engineering, 3(5), 2001, pp. 22-37, (PDF)

Brent Leback, Michael Wolfe, and Douglas Miles "The PGI Fortran and C99 OpenACC Compilers", Proceedings of Cray User Group (CUG) meeting, 2012. (PDF)

9/22 Expressing Parallelism (Frameworks)

S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith. Efficient management of parallelism in object oriented numerical software libraries. In E. Arge, A. M. Bruaset, and H. P. Langtangen, editors, Modern Software Tools in Scientific Computing, pages 163-202. Birkhuser Press, 1997. (PDF)

T. Goodale, G. Allen, G. Lanfermann, J. Mass, T. Radke, E. Seidel, and J. Shalf. The Cactus Framework and Toolkit: Design and Applications. In Vector and Parallel Processing - VECPAR 2002, 5th International Conference. Springer, 2003. (PDF)

Architectures

9/24 Shared Memory

Laudon, J., Lenoski, D., “The SGI Origin: a ccNUMA highly scalable server”, ISCA '97, pp. 241-51, May 1997 (PDF)

SGI, “Technical Advances in the SGI® Altix® UV”, SGI White paper, 2009 (PDF)

9/29 Message Passing and Communication

Robert M. Metcalfe , David R. Boggs, “Ethernet: distributed packet switching for local computer networks, Communications of the ACM”, v.19 n.7, p.395-404, July 1976 (PDF)

Mellanox Technologies, “Introduction to InfiniBand”, (PDF)

10/1 Custom Highend machines

B. Austin, M. J. Cordery, H. J. Wasserman and N. J. WrightPerformance Measurements of the NERSC Cray Cascade System”, Cray User’s Group 2013 (PDF)

Gara, et. al, Overview of the Blue Gene/L system architecture”, IBM Journal of Research and Development, 49(2/3) Fall 2005, (PDF)

10/6 Stream Processing and GPUs

M. Garland and D. B. Kirk. Understanding throughput-oriented architectures. CACM 53, 11 (November 2010), p. 58-66. (PDF)

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU, ISCA 2010, May 2010 (PDF).

10/8 Computational Grids & Clouds

I. Foster and C. Kesselman, "Computational Grids", Chapter 2 of The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 1999. (PDF)

Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, Alexander Rasin, "MapReduce and Parallel DBMSs: Friends or Foes?", Communications of the ACM, 53(1), Jan. 2010, pp. 64-71. (PDF)

Tools

10/13 Event Ordering

L. Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System," CACM, 21(7), 1978, pp. 558-564 (PDF).

S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson, "Eraser: A Dynamic Data Race Detector for Multi-Threaded Programs," Proceedings of the 16th Symposium on Operating Systems Principles (PDF).

10/15 Performance Metrics

A. J. Goldberg and J. L. Hennessy, "Performance Debugging Shared Memory Multiprocessor Programs with MTOOL", Supercomputing'91. Nov. 18-22, 1991, Albuquerque, NM, pp. 481-490 (PDF).

J. K. Hollingsworth, "Critical Path Profiling of Message Passing and Shared-memory Programs," IEEE Transactions on Parallel and Distributed Computing, 9(10), 1998, pp. 1029-1040. (PDF).

10/20 Data Collection and Instrumentation

Nicholas Nethercote and Julian Seward. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation, PLDI 2007, June 2007 (PDF)

B. R. Buck and J.K. Hollingsworth , An API for Runtime Code Patching, Journal of High Performance Computing Applications, 14 (4) (Winter 2000), pp. 317-329. (PDF)

10/22 Scheduling - Short Term

John K Ousterhout, "Scheduling Techniques for Concurrent Systems", International Conference on Distributed Computing Systems, 1982, pp 22-30. (PDF).

A. C. Dusseau, R. H. Arpaci, D. E. Culler, "Effective Distributed Scheduling of Parallel Workloads", ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1996, Philadelphia, PA. (PDF).

10/27 Performance Tools

S. Shende and A. D. Malony, "The TAU Parallel Performance System," International Journal of High Performance Computing Applications, SAGE Publications, 20(2):287-331, Summer 2006 (PDF).

L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 22(6):685–701, 2010. (PDF)

10/29 Auto tuning

W. Gu, G. Eisenhauer, E. Kraemer, K. Schwan, J. Stasko, J. Vetter, and N. Mallavurupu, "Falcon: On-line Monitoring and Steering of Large-Scale Parallel Programs," Frontiers '95. Feb 6-9, 1995, McLean, VA, IEEE Press, pp. 422-429. (PDF)

A. Tiwari, J. K. Hollingsworth, “End-to-end Auto-tuning with Active Harmony”, in Performance Tuning in Scientific Computing, D. Bailry & S. Williams, ed.

11/3 Cache Tools

Margaret Martonosi, Anoop Gupta, Thomas Anderson, “MemSpy: analyzing memory system bottlenecks in programs”, SIGMETRICS 92, (PDF)

M. Burtscher, B-D Kim, J. Diamond, J. McCalpin, L. Koesterke, and J. Browne, “PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications”, ACM/IEEE SC’10. (PDF)

11/5 Runtime Parallelism

S.J. Fink, S.R. Kohn, and S.B. Baden, “Efficient Run-time Support for Irregular Block-Structured Applications”, Journal of Parallel and Distributed Computing, 50(1), 1998. (PDF)

G. Agrawal, A. Sussman, and J. Saltz, “An Integrated Runtime and Compile-time Approach for Parallelizing Structured and Block Structured Applications”, IEEE Transactions on Parallel and Distributed Computing, 6(7), 1995. (PDF)

Systems Issues

11/10 Scheduling – Batch Queues

D. G. Feitelson and A. M. a. Weil, "Utilization and Predictability in Scheduling the IBM SP2 with Backfilling," 2th Intl. Parallel Processing Symposium. April 1998, Orlando, Florida, pp. 542-546. (Use this extended form – (PDF)

J. Weinberg, A. Snavely, “Symbiotic Space-Sharing on SDSC's DataStar System”, 12th Workshop on Job Scheduling Strategies for Parallel Processing In Conjunction with SIGMETRICS 2006, Saint-Malo, France (PDF)

11/12 Finding Idle Resources

M. Litzkow, M. Livny, and M. Mutka, "Condor - A Hunter of Idle Workstations," International Conference on Distributed Computing Systems. June 1988, pp. 104-111. (PDF).

David P. Anderson, Carl Christensen and Bruce Allen, "Designing a Runtime System for Volunteer Computing", In Proceedings of SC'06, November 2006. (PDF).

11/17 Midterm

11/19 Parallel I/O

Terry Jones, Alice Koniges and R. Kim Yates, “Performance of the IBM General Parallel File System,” 14th International Parallel and Distributed Processing Symposium (IPDPS'00), (PDF)

A. Acharya, M. Uysal, and J. Saltz, "Active Disks: Programming Model, Algorithms and Evaluation," Eighth International Conference on Architectural Support for Programming Languages and Operating Systems. Oct.1998, San Jose, CA. (PDF)

11/24 Work in Progress session

11/26 Thanksgiving

12/1 Performance Prediction

M. E. Crovella, Thomas J. LeBlanc, "Parallel Performance Prediction Using Lost Cycles", Proceedings of Supercomputing '94, 1994. (PDF)

L. Carrington, M. Laurenzano, A. Snavely, R. Campbell, L. Davis, “How Well Can Simple Metrics Represent the Performance of HPC Applications?”, Proceedings of SC’05, Nov. 2005, (PDF)

Applications

12/3 Gordon Bell Finalists

J. Rudi, A. C. I. Malossi, T. Isaac, G. Stadler, M. Gurnis, P. W. J. Staar, Y. I., C. Bekas, A. Curioni, and O. Ghattas, "An extreme-scale implicit solver for complex PDEs: highly heterogeneous flow in earth's mantle". SC '15 (PDF)

D. Rossinelli, Y. Tang, K. Lykov, D. Alexeev, M. Bernaschi, P. Hadjidoukas, M. Bisson, W. Joubert, C. Conti, G. Karniadakis, M. Fatica, I. Pivkin, and P. Koumoutsakos, "The in-silico lab-on-a-chip: petascale and high-throughput simulations of microfluidics at cell resolution", SC '15 (PDF).

12/8 Project Presentations

12/10 Project Presentations