CMSC 714- Readings

Readings
CMSC 714: High Performance Computing

CMSC 714
Syllabus
Projects
Readings
Lectures
Exams
Dates

Note: for each class (after the intro material), 4 students will be responsible for emailing me (als@cs.umd.edu) with ~4 discussion question on the reading(s) for that day by 6PM the day before the class, and be prepared to ask those questions and help explain the paper to the rest of the class.

Introduction - What and Why?

9/1 Parallel Computing and Parallel Computers

from Lecture Notes

9/6 Applications of Parallel Computing

from Lecture Notes

Programming Models

9/8-13 Expressing Parallelism (Explicit Control)

V.S. Sunderam, G.A. Geist, J. Dongarra, and R. Manchek, "The PVM Concurrent Computing System: Evolution, Experiences, and Trends", Parallel Computing, 20(4), 1994. [PDF]
J. J. Dongarra, S. W. Otto, M. Snir, and D. Walker, "A message passing standard for MPP and workstations," Communications of the ACM, 39(7), 1996, pp. 84-90. [PDF]

9/15-20 Introduction to Debugging Parallel Programs

from Taiga Nakamura's lecture notes

9/20 Expressing Parallelism (Implicit Control) - E. Ahmed, S. Bach, I. Bhati, S. Bondugula

William W. Carlson et al, "Introduction to UPC and Language Specification," CCS-TR-99-157. [PDF]
L. Dagum and R. Menon, "OpenMP: An Industry-Standard API for Shared-Memory Programming," IEEE Computational Science & Engineering, 5(1), 1998. [PDF]

9/22 Expressing Parallelism (Hybrids) - T. Creech, C.L. Teo, A. Ecins, J. Edwards

Kathy Yelick et. al., "Titanium: A High Performance Java Dialect", Concurrency: Practice & Experience, 10(11-13), 1998. [PDF]
Steve W. Bova et. al., "Parallel Programming with Message Passing and Directives", Computing in Science & Engineering, 3(5), 2001. [PDF]

9/27 Expressing Parallelism (Frameworks) - E. Elsaka, K. Elwazeer, H. He, C. Kang

S. Balay, W. D. Gropp, L. C. McInnes, and B. F. Smith, "Efficient Management of Parallelism in Object Oriented Numerical Software Libraries", In E. Arge, A. M. Bruaset, and H. P. Langtangen, editors, Modern Software Tools in Scientific Computing, pages 163--202, Birkhäuser Press, 1997. [PDF]
T. Goodale, G. Allen, G. Lanfermann, J. Massó, T. Radke, E. Seidel, and J. Shalf., "The Cactus Framework and Toolkit: Design and Applications", In Proceedings of Vector and Parallel Processing - VECPAR 2002, Springer, 2003. [PDF]

Architectures

9/29 - No class, holiday

10/4 Shared Memory - X. Chen, G. Kothari, B. London, J. Mondal

J. Laudon and D. Lenoski, "The SGI Origin: a ccNUMA highly scalable server," In Proceedings of 1997 International Symposium on Computer Architecture (ISCA '97), May 1997. [PDF]
SGI, "Technical Advances in the SGI® Altix® UV," SGI White paper, 2009. [PDF]

10/6 Message Passing and Communication - V. Nagaraja, K. Nandy, O. Oza, A. Quamar

Robert M. Metcalfe and David R. Boggs, "Ethernet: distributed packet switching for local computer networks," Communications of the ACM, (19)7, 1976. [PDF]
Mellanox Technologies white paper, "Introduction to InfiniBand.". [PDF]

10/11 Custom Machines - T. Rekatsinas, K. Taylor, C. Dunbar

S.R. Alam, J.A. Kuehn, R.F. Barrett, J.M. Larkin, M.R. Fahey, R. Sankaran, P.H. Worley , "Cray XT4: An Early Evaluation for Petascale Scientific Simulation", In Proceedings of SC'07, Nov. 2007. [PDF]
A. Gara, et. al, "Overview of the Blue Gene/L system architecture", IBM Journal of Research and Development, 49(2/3), Fall 2005. [PDF]

10/13 Stream Processing and GPUs - A. White, K. Yoo, Y. Zhou, E. Ahmed

A. E. Eichenberger , et. al, "Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture", IBM Systems Journal, 45(1), Jan. 2006. [PDF]
"Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU", In Proceedings of 2010 International Symposium on Computer Architecture (ISCA), May 2010. [PDF]

10/18 Computational Grids - S. Bach, I. Bhati, S. Bondugula, X. Chen

I. Foster and C. Kesselman, "Computational Grids", Chapter 2 of The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 1999. [PDF]
A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke, "The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets", Journal of Network and Computer Applications, 23:187-200, 2001. [PDF]

10/20 Clouds - T. Creech, C. Dunbar, A. Ecins, J. Edwards

Jeffrey Dean and Sanjay Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters", In Proceedings of OSDI'04, pp. 137-150 [PDF]
Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, Alexander Rasin, "MapReduce and Parallel DBMSs: Friends or Foes?", Communications of the ACM, 53(1), Jan. 2010, pp. 64-71. [PDF]

Tools

10/25 Event Ordering and Race Detection - E. Elsaka, K. Elwazeer, H. He, C. Kang

L. Lamport, "Time, Clocks, and the Ordering of Events in a Distributed System", Communications of the ACM, 21(7), 1978, pp. 558-564. [PDF]
S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson, "Eraser: A Dynamic Data Race Detector for Multi-Threaded Programs", In Proceedings of the 16th Symposium on Operating Systems Principles, ACM Press, Oct. 1997. [PDF]

10/27 Performance Metrics - G. Kothari, B. London, E. Ahmed, V. Nagaraja

A. J. Goldberg and J. L. Hennessy, "Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications", IEEE Transactions on Parallel and Distributed Computing, 4(1), 1993. [PDF]
J. K. Hollingsworth, "Critical Path Profiling of Message Passing and Shared-memory Programs", IEEE Transactions on Parallel and Distributed Computing, 9(10), 1998, pp. 1029-1040. [PDF]

11/1 Data Collection and Instrumentation - K. Nandy, O. Oza, T. Rekatsinas, I. Bhati

Nicholas Nethercote and Julian Seward, "Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation", In Proceedings of the 2007 ACM/SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2007. [PDF]
B. R. Buck and J.K. Hollingsworth , "An API for Runtime Code Patching," International Journal of High Performance Computing Applications, 14 (4), Winter 2000, pp. 317-329. [PDF]

11/3 Scheduling - Short Term - K. Taylor, C.L. Teo, A. White, K. Yoo

Y. Zhang, A. Sivasubramaniam, J. Moreira, and H. Franke, "Impact of Workload and System Parameters on Next Generation Cluster Scheduling Mechanisms", IEEE Transactions on Parallel and Distributed Systems, 12(9), Sept. 2001, pp. 967-985. [PDF]
A.C. Dusseau, R.H. Arpaci, D.E. Culler, "Effective Distributed Scheduling of Parallel Workloads", In Proceedings of ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, ACM Press, May 1996. [PDF]

11/8 Cache Tools - Y. Zhou, J. Mondal, S. Bach, A. Quamar

J. Mellor-Crummey, D. Whalley, and K. Kennedy, "Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings", International Journal of Parallel Programming, 29(3), June 2001. [PDF]
Margaret Martonosi, Anoop Gupta, Thomas Anderson, "MemSpy: analyzing memory system bottlenecks in programs", ACM SIGMETRICS Performance Evaluation Review, 20(1), 1992. [PDF]

11/10 Runtime Parallelization - S. Bondugula, X. Chen, T. Creech, C. Dunbar

S.J. Fink, S.R. Kohn, and S.B. Baden, "Efficient Run-time Support for Irregular Block-Structured Applications", Journal of Parallel and Distributed Computing, 50(1), 1998. [PDF]
G. Agrawal, A. Sussman, and J. Saltz, "An Integrated Runtime and Compile-time Approach for Parallelizing Structured and Block Structured Applications", IEEE Transactions on Parallel and Distributed Computing, 6(7), 1995. [PDF]

Systems Issues

11/15 Finding Idle Cycles - A. Ecins, J. Edwards, E. Elsaka, K. Elwazeer

M. Litzkow, M. Livny, and M. Mutka, "Condor - A Hunter of Idle Workstations", In Proceedings of International Conference on Distributed Computing Systems, June 1988, pp. 104-111. [PDF]
- D. Thain, T. Tannenbaum, and M. Livny " Distributed Computing in Practice: The Condor Experience", Concurrency and Computation: Practice and Experience , Vol. 17, Nos. 2-4, 2005. [PDF]
David P. Anderson, Carl Christensen and Bruce Allen, "Designing a Runtime System for Volunteer Computing", In Proceedings of SC'06, November 2006. [PDF]

11/17 Midterm Exam

11/22 Scheduling - Batch Queues - H. He, C. Kang, G. Kothari, B. London

D. G. Feitelson and A. M. a. Weil, "Utilization and Predictability in Scheduling the IBM SP2 with Backfilling", 12th International Parallel Processing Symposium, April 1998. Use this extended form [PDF]
J. Weinberg and A. Snavely, "Symbiotic Space-Sharing on SDSC's DataStar System", 12th Workshop on Job Scheduling Strategies for Parallel Processing, 2006. [PDF]

11/29 Parallel I/O - J. Mondal, V. Nagaraja, K. Nandy, O. Oza

Terry Jones, Alice Koniges, and R. Kim Yates, "Performance of the IBM General Parallel File System", In Proceedings of 14th International Parallel and Distributed Processing Symposium (IPDPS'00), April 2000. [PDF]
A. Acharya, M. Uysal, and J. Saltz, "Active Disks: Programming Model, Algorithms and Evaluation", In Proceedings of Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, October 1998. [PDF]

Applications

12/1 Applications - A. Quamar, T. Rekatsinas, K. Taylor, C.L. Teo

U. Catalyurek, M. Beynon, C. Chang, T. Kurc, A. Sussman, and J. Saltz, "The Virtual Microscope", IEEE Transactions on Information Technology in Biomedicine, Vol. 7, No. 4, 2003. [PDF]
David E. Shaw et. al., "Millisecond-scale molecular dynamics simulations on Anton", Proceedings of SC'09, November 2009. [PDF]

12/6 Project Demos

12/8 Project Demos

12/13 SC11 Gordon Bell award winners - A. White, K. Yoo, Y. Zhou

Sustained Performance Prize -Y. Hasegawa et. al., "First-Principles Calculations of Electron States of a Silicon Nanowire with 100,000 Atoms on the K Computer ", Proceedings of SC'11, November 2011. [PDF]
Scalability/Time to Solution Prize - T. Shimokawabe et. al., "Peta-scale Phase-field Simulation for Dendritic Solidification on the TSUBAME 2.0 Supercomputer ", Proceedings of SC'11, November 2011. [PDF]