PhD Proposal: Multi-Agent Reinforcement Learning: Systems for Evaluation and Relations to Complex Systems

Talk

Jordan Terry

Time:

02.01.2021 15:30 to 17:30

Location:

Remote

URL:

https://talks.cs.umd.edu/talks/2738

My thesis work all revolves in some way around one question: "how can we achieve effective means of controlling swarms of intelligent agents?" Reinforcement learning, learning to optimally control systems, is in my view the most natural and productive framework to view this problem in.Achieving such a feat as large as learning effective swarm intelligence is a task beyond the scope of anyone's thesis. Instead I've tried to solve what I feel are the most foundational problems in achieving this. A plurality of my work concerns developing tools that allow the research community to study this effectively, centering around PettingZoo. PettingZoo is an open source library which automates the largest piece of the work required by researchers to study multi-agent reinforcement learning, and improves the ability to build on the work of other researchers. An estimated 7000 total man hours have been put into this project by around 30 researchers all over the world, with 11 authors credited on the paper, and it's already become used by many researchers all over the world. The AEC Games, SuperSuit, and multiplayer ALE projects were all chiefly done as necessary work to develop PettingZoo. PettingZoo was motivated by bringing the productivity Gym brought to single agent reinforcement learning to multi-agent reinforcement learning. My one ongoing project in this space is Colosseum, which similarly seeks to bring the benefits Kaggle brought to general machine learning to all of reinforcement learning. general machine learning.My other work has been far more eclectic in this space. The paper on parameter sharing shows that a particularly interesting vein of multi-agent reinforcement learning methods optimally mitigates a fundamental hardness result in multi-agent reinforcement learning, and proves a set of methods that allow approaches in this vein to be used for all cases of cooperative multi-agent reinforcement learning. To conduct this work, I had to develop The MAILP model, a general model of information transfer during multi-agent learning. This is a rather useful and novel thing to have, and it turns out that this model allows for a remarkable result: classifying regimes of multi-agent learning based on convergence bounds. Finalizing this work is ongoing. My ongoing project is that I'm attempting to learn bird flocking patterns in a very physically realistic simulation through multi-agent reinforcement learning. This leverages every past project in this thesis, and would offer fundamental insights into the emergent behavior, complex systems and biological communities, as well as representing the first time learning has ever been able to reproduce natural levels of swarm intelligence.Examining Committee:

Chair: Dr. John Dickerson Dept rep: Dr. David Jacobs Members: Dr. Naomi Feldman Dr. Dan Lathrop Dr. Dinesh Manocha