Reinforcement Learning Seminar Series

The RLSS is administered by Jordan Terry (jkterry [-at-] umd [dot] edu), all talks are at this Zoom link: Talk recordings and slides will be posted here within a few days of every talk.


Past Talks:

Jakob Bauer | DeepMind

11:00 a.m. — 12:00 p.m. Friday, April 1st

Generally Capable Reinforcement Learning Agents

Artificial agents have achieved great success in individually challenging simulated environments, mastering the particular tasks they were trained for, with their behaviour even generalising to maps and opponents that were never encountered in training. In this talk we explore our recent work "Open-Ended Learning Leads to Generally Capable Agents" in which we create agents that can perform well beyond a single, individual task, that exhibit much wider generalisation of behaviour to a massive, rich space of challenges. We discuss the design of our environment spanning a vast set of tasks and how open-ended learning processes lead to agents that are generally capable across this space and beyond.


Vlad Firoiu | DeepMind

5:00 p.m. — 6:15 p.m. Wednesday, March 31st

Reinforcement learning for beating Super Smash Bros. Melee and Proving Mathematical Theorems

In the first half hour Vlad will discuss about his work on deep RL for Super Smash Bros. Melee: the road to building an AI that beats professional players, the challenges of making it a fair match between humans and machines, lessons learned along the way, and future directions smash bros. AI. In the second half of the talk Vlad will discuss his recent work on applying deep "RL" techniques to one of the most exciting application domains for AI: mathematics.


Marc Bellemare | Google Brain, MILA, McGill University

Autonomous Navigation of Stratospheric Balloons Using Reinforcement Learning and The History of Atari Games in Reinforcement Learning

4:00 p.m. — 5:15 p.m. Wednesday, March 17th

Marc Bellemare created the Arcade Learning Environment (how RL interfaces with Atari games) and used it to co-created deep reinforcement while at DeepMind. He'll be giving a talk on his recent Nature paper on controlling Loon balloons using RL, as well as the history of Atari games in reinforcement learning.

Slides available upon request from speaker


Ilya Kuzovkin |

5:00 p.m. — 6:15 p.m. Tuesday, March 2nd

Deep Reinforcement Learning for Real-World Robotics

OffWorld is developing a new generation of autonomous industrial robots to do the heavy lifting first on Earth, then on Moon, Mars and asteroids. We see reinforcement learning as one of major candidate technologies that could allow us to reach a high level of autonomy. While RL has achieved remarkable results in games and simulators, its adoption for real physical robots has been slow. In this talk we will go over a few projects we did at OffWorld that relate to applying RL on real robots, we then make the case that there is an apparent gap between RL community's aspirations to apply RL on real physical agents and its reluctance to move beyond simulators. To bridge this gap we introduce OffWorld Gym — a free access real physical environment and an open-source library that allows anyone to deploy their algorithms on a real robot using the familiar OpenAI gym ecosystem and without the burden of managing a real hardware system nor any knowledge of robotics.

Slides and recording unable to be publicly posted


Jian Hu, Seth Austin Harding | National Taiwan University, Taipei

RIIT: Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

10:00 a.m. —  11:00 a.m. Friday, February 26th

In recent years, Multi-Agent Deep Reinforcement Learning (MADRL) has been successfully applied to various complex scenarios such as playing computer games and coordinating robot swarms. In this talk, we investigate the impact of “implementation tricks” for SOTA cooperative MADRL algorithms, such as QMIX, and provide some suggestions for tuning. In investigating implementation settings and how they affect fairness in MADRL experiments, we found some conclusions contrary to the previous work; we discuss how QMIX’s monotonicity condition is critical for cooperative tasks. Finally, we propose the new policy-based algorithm RIIT that achieves SOTA among policy-based algorithms.


Ben Eysenbach, Julian Abhishek | Carnegie Mellon University, Google Brain

Diversity is All You Need: Learning Skills without a Reward Function

1:00 p.m. — 2:15 p.m. Wednesday, January 27th

Intelligent creatures can explore their environments and learn useful skills without supervision. In this talk, we will present a method, 'Diversity is All You Need' (DIAYN), for learning useful skills without a reward function. We show how pretrained skills can provide a good parameter initialization for downstream tasks, and can be composed hierarchically to solve complex, sparse reward tasks. We will then discuss a close connection between autonomous skill discovery and meta-learning. Whereas typical meta-reinforcement learning algorithms require a manually-designed family of reward functions, we show how to use DIAYN to propose tasks for meta-learning in an unsupervised manner, effectively resulting in an unsupervised meta-learning algorithm. While there has been considerable work in this area in the past few years, a number of algorithmic and theoretical questions remain open. We plan to highlight some of these challenges at the end.


Jordan Terry | University of Maryland, College Park

1 p.m. - 2:15 p.m. Wednesday, January 6

Multi-Agent Reinforcement Learning: Systems for Evaluation

This talk discusses 4 papers and an ongoing project that deal with various aspects of designing software systems for multi-agent reinforcement learning, that allow for more productive and reproducible research, a democratization of research multi-agent reinforcement learning to university level researchers, and a new problems that are challenging to reinforcement learning in important ways. All works are centered around the PettingZoo project. This talk is specifically tailored to a very broad audience.

Slides available upon request, talk accidentally not recorded


Logan Engstrom | Massachusetts Institute of Technology

1:00 p.m. — 2:30 p.m. Wednesday, Novemeber 18th

A Closer Look at Deep Policy Gradient Algorithms

Deep reinforcement learning methods are behind some of the most publicized recent results in machine learning. In spite of these successes, however, deep RL methods face a number of systemic issues: brittleness to small changes in hyperparameters, high reward variance across runs, and sensitivity to seemingly small algorithmic changes. In this talk we take a closer look at the potential root of these issues. Specifically, we study how the policy gradient primitives underlying popular deep RL algorithms reflect the principles informing their development.


Jayesh Gupta | Stanford and Microsoft Research

1:30 p.m. — 2:15 p.m. Wednesday, October 21st

Deep Implicit Coordination Graphs for Multi-agent Reinforcement Learning.

Multi-agent reinforcement learning (MARL) requires coordination to efficiently solve certain tasks. Fully centralized control is often infeasible in such domains due to the size of joint action spaces. Coordination graph based formalization allows reasoning about the joint action based on the structure of interactions. However, they often require domain expertise in their design. In this talk, we will discuss the recently introduced deep implicit coordination graph (DICG) architecture for such scenarios. DICG consists of a module for inferring the dynamic coordination graph structure which is then used by a graph neural network based module to learn to implicitly reason about the joint actions or values. DICG allows learning the tradeoff between full centralization and decentralization via standard actor-critic methods to significantly improve coordination for domains with large numbers of agents.


Chris Nota | University of Massachusetts, Amherst

2 p.m. — 3:15 p.m Wednesday, September 23rd

Is the Policy Gradient a Gradient? 

Most popular policy gradient methods do not truly follow the gradient of the discounted objective. Is there another objective they optimize instead? In this talk, we examine the vector field followed by almost all popular policy gradient methods and prove that it is not the gradient of any objective. We discuss other properties of this vector field and examine why it may be effective in practice. We also discuss some other recent work on policy gradients.