PhD Proposal: Dynamic strategies for managing scientific workflows on Supercomputers

Talk
Swati Singhal
Time: 
05.08.2020 10:00 to 12:00

Large-scale experiments that involve coupled applications often have unpredictable runtime behaviors on HPC clusters. These behaviors can be attributed to the changes in data-flow rate between these workflow applications, and how these applications interfere with each other's performance at runtime. Such behaviors further complicate when complex applications are involved, and the coupling is based on in-memory exchanges or node-to-node transfers. In some special cases, users require dynamism to manipulate the workflow at runtime to support a desired functionality.For such workflows, pre-determining resource usage is challenging, and static resource allocation generally results in either wasting resources due to over-provisioning or performance issues or failures due to under-provisioning. An automated service that can adjust the resource assignment and experiment configuration (if desired) with changing runtime requirements of applications can go a long way to improve these experiments but currently, there is no support on supercomputers that could provide such an orchestration.I propose to explore dynamic strategies to manage workflows that will entail; (a) developing monitoring methods to facilitate identification of dynamic opportunities where changing runtime state can benefit the experiment, (b) devising decision policies that would dictate appropriate control on observing such opportunities, (c) exploring possible runtime actions and designing algorithms that will govern how the decided actions will enable the modification of the state of the workflows. Further, by building a prototype management system that incorporates my strategies and applying them on real scientific workflows, I will demonstrate the effectiveness of my methods in practice. I believe this research would not only serve as a baseline for dynamic management on supercomputers but would further complement the research endeavors to incorporate runtime capabilities in existing HPC tools.Examining Committee:

Chair: Dr. Alan Sussman Dept rep: Dr. Howard Elman Members: Dr. Matthew D. Wolf Dr. Abhinav Bhatele