CMSC818P (Spring 2013) – Exascale Computing

Reading List (list to be updated frequently)

 

Applications/needs for Exascale

·         (1/29) “What are the Modeling Requirements for Seamless Prediction of Weather and Climate from Days to Decades? The Known, the Unknown and the Unknowable.” Report of a Sloan Foundation Workshop (December 1-2, 2007; The Royal Society, U.K.).  (PDF)

 

Parallel Computing 101

            Emailed to people who need it

 

Looking Back – Petascale

·         (1/31 – to pg. 79) “Enabling Technologies for Peta(FL)OPS Computing” (PDF)

·         (2/5 – pg. 80-184) “Enabling Technologies for Peta(FL)OPS Computing” (PDF)

·         (2/7 – pg. 185-) “Enabling Technologies for Peta(FL)OPS Computing” (PDF)

·         (2/12) “The 1997 Petaflops Algorithms Workshop Summary Report” (PDF)

 

Petascale – The Reality (Hardware)

 

·         (2/14) Arthur S. Bland, Ricky A. Kendall, Douglas B. Kothe, James H. Rogers, Galen M. Shipman, “Jaguar: The World’s Most Powerful Computer” (PDF)

 

Petascale – The Reality (Software)

 

·         (2/19) Abtin Rahimian, Ilya Lashuk, Shravan Veerapaneni, Aparna Chandramowlishwaran, Dhairya Malhotra, Logan Moon, Rahul Sampath, Aashay Shringarpure, Jeffrey Vetter, Richard Vuduc, Denis Zorin, and George Biros. 2010. Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10). DOI=10.1109/SC.2010.42 (PDF)

 

Looking Forward - Government Road Maps

·         (2/21 – part I) “ExaScale Software Study: Software Challenges in Extreme Scale Systems” (PDF)

·         (2/26 – part II) “ExaScale Software Study: Software Challenges in Extreme Scale Systems” (PDF)

 

Co-Design

·         (2/28) J. Shalf, D. Quinlan, C. Janssen, "Rethinking Hardware-Software Codesign for Exascale Systems," Computer , vol.44, no.11, pp.22-30, Nov. 2011 doi: 10.1109/MC.2011.300 (PDF).

           

Workload of Systems

·         (3/5) Wayne Joubert and Shi-Quan Su. 2012. An analysis of computational workloads for the ORNL Jaguar system. In Proceedings of the 26th ACM international conference on Supercomputing (ICS '12). ACM, New York, NY, USA, 247-256. (PDF)

 

Current State of the Art Applications

·         (3/7) Tomoaki Ishiyama, Keigo Nitadori, and Junichiro Makino. 2012. 4.45 Pflops astrophysical N-body simulation on K computer: the gravitational trillion-body problem. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Article 5, 10 pages. (PDF)

·         (3/12) Tan Bui-Thanh, Carsten Burstedde, Omar Ghattas, James Martin, Georg Stadler, and Lucas C. Wilcox. 2012. Extreme-scale UQ for Bayesian inverse problems governed by PDEs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Article 3, 11 pages. (PDF)

 

Reduced Reliability

·         (3/14) Bianca Schroeder,  Garth A. Gibson, “Understanding Failures in Petascale Computers”  (PDF)

·         (3/19 & 3/21) Spring Break

·         (3/26) Franck Cappello, Al Geist, Bill Gropp,  Laxmikant Kale, Bill Kramer, Marc Snir “Toward Exascale Resilience” (PDF)

·         (3/28) Dong Li, Jeffrey Vetter, Weikuan Yu, “Classifying Soft Error Vulnerabilities in Extreme-Scale Scientific Applications Using a Binary Instrumentation Tool”, SC12 (PDF)

·         (4/2) Jinsuk Chung, Ikhwan Lee, Michael Sullivan, Jee Ho Ryoo, Dong Wan Kim, Doe Hyun Yoon, Larry Kaplan, and Mattan Erez. 2012. Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Los Alamitos, CA, USA (PDF)

 

 

Programming Models

·         (4/4) Sandra Wienke, Paul Springer, Christian Terboven, and Dieter an Mey, “OpenACC — First Experiences with Real-World Applications”, Euro-Par 2012, LNCS 7484, pp. 859–870, 2012 (PDF)

·         (4/9) Bradford L. Chamberlain, “A Brief Overview Of Chapel” (PDF)

·         (4/11) Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Torsten Hoefler, Sameer Kumar, Ewing Lusk, Rajeev Thakur, and Jesper Larsson Träff, "MPI on Millions of Cores," Parallel Processing Letters, 21(1):45-60, March 2011.  (PDF)

 

Variable Precision

 

·         (4/16) John Jenkins, Eric R. Schendel, Sriram Lakshminarasimhan, David A. Boyuka, II, Terry Rogers, Stephane Ethier, Robert Ross, Scott Klasky, and Nagiza F. Samatova. 2012. Byte-precision level of detail processing for variable precision analytics. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Los Alamitos, CA, USA, Article 48, 11 pages. (PDF)

·         (4/18) Michael O. Lam, Jeffrey K. Hollingsworth, Bronis R. de Supinski, Matthew P. LeGendre, “Automatically Adapting Programs for Mixed-Precision Floating-Point Computation”, In Proceedings of the International Conference on Supercomputing (ICS) – To Appear June 2013 (PDF).

 

Power Management

·         (4/23) K. Shen, A. Shriraman, S. Dwarkadas, X. Zhang, and Z. Chen. Power containers: an OS facility for fine-grained power and energy management on multicore servers. In Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '13). (PDF)

·         (4/25) M. Gamell, I. Rodero, M. Parashar, and R. Muralidhar. Exploring cross-layer power management for PGAS applications on the SCC platform. In Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing (HPDC '12).  (PDF)

 

Check pointing

·         (4/30) C. Wang; F. Mueller, C.  Engelmann, S.L. Scott,  "Hybrid Checkpointing for MPI Jobs in HPC Environments," Parallel and Distributed Systems (ICPADS), 2010 IEEE 16th International Conference on , vol., no., pp.524,533, 8-10 Dec. 2010 (PDF)

·         (5/2) X. Dong, Y. Xie, N. Muralimanohar, and N. P. Jouppi. 2011. Hybrid checkpointing using emerging nonvolatile memories for future exascale systems. ACM Trans. Archit. Code Optim. 8, 2, Article 6 (June 2011), 29 pages. (PDF).

 

 

Project Reports

            (5/7 & 5/9)