The RLSS is administered by Justin Terry (jkterry [-at-] umd [dot] edu), all talks are at this Zoom link: https://umd.zoom.us/j/2920984437. Talk recordings and slides will be posted here within a few days of every talk.
Chris Nota | University of Massachusetts, Amherst
2 p.m. — 3:15 p.m Wednesday, September 23rd
Is the Policy Gradient a Gradient?
Most popular policy gradient methods do not truly follow the gradient of the discounted objective. Is there another objective they optimize instead? In this talk, we examine the vector field followed by almost all popular policy gradient methods and prove that it is not the gradient of any objective. We discuss other properties of this vector field and examine why it may be effective in practice. We also discuss some other recent work on policy gradients.
Jayesh Gupta | Stanford and Microsoft Research
1:30 p.m. — 2:15 p.m. Wednesday, October 21st
Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning.
Multi-agent reinforcement learning (MARL) requires coordination to efficiently solve certain tasks. Fully centralized control is often infeasible in such domains due to the size of joint action spaces. Coordination graph based formalization allows reasoning about the joint action based on the structure of interactions. However, they often require domain expertise in their design. In this talk, we will discuss the recently introduced deep implicit coordination graph (DICG) architecture for such scenarios. DICG consists of a module for inferring the dynamic coordination graph structure which is then used by a graph neural network based module to learn to implicitly reason about the joint actions or values. DICG allows learning the tradeoff between full centralization and decentralization via standard actor-critic methods to significantly improve coordination for domains with large numbers of agents.
Logan Engstrom | Massachusetts Institute of Technology
1:00 p.m. — 2:30 p.m. Wednesday, Novemeber 18th
A Closer Look at Deep Policy Gradient Algorithms
Deep reinforcement learning methods are behind some of the most publicized recent results in machine learning. In spite of these successes, however, deep RL methods face a number of systemic issues: brittleness to small changes in hyperparameters, high reward variance across runs, and sensitivity to seemingly small algorithmic changes. In this talk we take a closer look at the potential root of these issues. Specifically, we study how the policy gradient primitives underlying popular deep RL algorithms reflect the principles informing their development.