PhD Proposal: Integrated Numeric Planning and Scheduling of Distributed Pipelined Workflows

Talk
Taylor Paul
Time: 
12.16.2025 10:00 to 12:00
Location: 

IRB-5105

Pipelines of computation to extract insights from raw data continue to increase in complexity and scale. When input data sources and output consuming applications exist on globally distributed networks, the challenge of establishing and maintaining these pipelines peaks. This work pursues automating these tasks for distributed data pipelines, or pipelined workflows. First, we develop a workflow and resource graph representation that includes both data processing and sharing components along with corresponding network interfaces for scheduling. Leveraging these graphs, we develop WORKSWORLD, a new domain for domain-independent artificial intelligence (AI) planners that assumes permanently scheduled workflows, like those found in sensor networks or ingest pipelines. We design a framework for leveraging our domain with a state-of-the-art numeric planner that permits users to define data sources, available workflow components and desired data destinations and formats using a human-readable data serialization language (i.e. YAML) without explicitly declaring the entire workflow as a goal. The planner solves the joint planning and scheduling problem to produce a series of actions that both builds the workflow graph and maps its components to the resources available in order to process source data into the desired formats at specified locations; we visualize the resulting state from the plan and present it back to the user. We model a centralized planner across distributed cloud, fog and edge sites and present empirical results on the size of resource and workflow graphs a state-of-the-art numeric planner can process in reasonable time and space on commodity hardware. This serves as a baseline for discussing potential future work exploring learning for planning approaches and decentralized multi-agent planners.