PhD Proposal: Towards Principled AI-Agents with Decentralized and Asymmetric Information

Talk
Xiangyu Liu
Time: 
06.02.2025 12:00 to 14:00

Abstract:
AI models have been increasingly deployed to develop ``Autonomous Agents'' for decision-making, with prominent application examples including playing Go and video games, robotics, autonomous driving, healthcare, human-assistant, etc. Most such success stories naturally involve multiple AI-agents interacting dynamically with each other and humans. More importantly, these agents oftentimes operate with asymmetric information in practice, both across different agents and across the training-testing phases. In this thesis, we aim to lay the theoretical foundations for principled AI agents operating under asymmetric and decentralized information.First, we will focus on Reinforcement Learning (RL)-Agents, in multi-agent environments with partially observable and decentralized information. To circumvent the known hardness results and the use of computationally intractable oracles, we advocate leveraging the potential information-sharing among agents. We first establish several computational complexity results to justify the necessity of information-sharing, as well as the observability assumption. Inspired by the inefficiency of planning in the ground-truth model, we then propose to further approximate the shared common information to construct an approximate model of the POSG, in which planning an approximate equilibrium can be quasi-efficient, under the aforementioned assumptions. Furthermore, we develop a partially observable multi-agent RL algorithm that is both statistically and computationally quasi-efficient.Secondly, we will focus on RL agents in partially observable Markov decision processes when there is privileged information in training, a common practice in robot learning and deep RL. We will firstly revisit two major empirical paradigms, expert distillation (a.k.a. teacher-student learning) and asymmetric actor-critic and demonstrate their pitfalls in finding near-optimal policies. Furthermore, we develop new principled algorithm with both polynomial sample complexity and (quasi)-polynomial computational complexity and revealed the provable benefits of such privileged information.Finally, we will examine Large-Language-Model (LLM)-(powered-)Agents, which use LLM as the main controller for decision-making, by understanding and enhancing their decision-making capability in canonical decentralized and multi-agent scenarios. In particular, we use the metric of Regret, commonly studied in Online Learning and RL, to understand LLM-agents' decision-making limits in context using controlled experiments. Motivated by the observed pitfalls of existing LLM agents, we also proposed a new fine-tuning loss to promote the no-regret behaviors of the models, both provably and experimentally.