PhD Proposal: Incorporating Prosocial Constraints and Exploiting Problem Structure in Sequential Decision-making and Model Evaluation
Sequential decision-making tasks canonically feature an agent who must explore the environment and exploit the knowledge it gains to maximize total expected reward over some time horizon. However, when algorithms are used to make decisions and induce behaviors over time in high-stakes domains, it is often necessary to trade-off reward maximization against competing objectives, such as individual or group fairness, cooperation, or risk mitigation. Additionally, when the decisions to be made are combinatorial, careful use of the structural information which characterizes or connects our decision points may facilitate our search for efficient solutions.In this proposal, we consider a constrained resource allocation task characterized by: (1) the presence of multiple objectives; and (2) the need to exploit different types of structure contained within the problem instances in order to ensure tractability and exploit externalities. We specifically consider the restless bandit setting, where a decision-maker is tasked with determining which subset of individuals (referred to as arms) should receive a beneficial intervention at each timestep, subject to the satisfaction of a budget constraint. Each restless arm is formalized as a Markov decision process (MDP), and receipt of the intervention results in an increased probability of a favorable state transition at the next timestep, relative to lack of receipt.It is PSPACE-hard to pre-compute the optimal policy for a given set of restless arms in the general case. However, as conjectured by Whittle and proven by Weber and Weiss,when each arm is indexable, a tractable solution exists that is provably asymptotically optimal: we can decouple the arms and consider a Lagrangian relaxation of the originalproblem. Our core contributions include the introduction of novel algorithms to address two limitations of Whittle index-based policies, including (1) the lack of distributive fairness guarantees; and (2) the inability to exploit externalities when resources are allocated within a community.
Dr. John Dickerson
Dr. Hal Daumé
Dr. Aravind Srinivasan