Kianté Brantley
Kianté Brantley is a Postdoctoral scholar at Cornell working with Thorsten Joachims. He completed his Ph.D. in computer science at the University of Maryland College Park (UMD) advised by Professor Hal Daumé III. Brantley designs algorithms that efficiently integrate domain knowledge into sequential decision-making problems. He is most excited about imitation learning and interactive learning—or, more broadly, settings that involve a feedback loop between a machine learning agent and the input the machine learning agent sees. Before coming to UMD in 2016, Brantley attended the University of Maryland, Baltimore County where he earned his bachelor’s degree and master's degree (advised by Tim Oates) in computer science. He also worked as a data scientist for the U.S. Department of Defense from 2010 to 2017. In his free time, Brantley enjoys playing sports; his favorite sport at the moment is powerlifting. Brantley is a member of the UMD CLIP Lab, UMBC CORAL Lab and NYU CILVR lab.
Email  / 
CV  / 
Google Scholar  / 
Semantic Scholar  / 
Github  / 
Twitter
|
|
Research
I'm interested in designing algorithms that efficiently integrate domain knowledge into sequential decision making problems (e.g. reinforcement learning, imitation learning and structure prediction for natural language processing).
|
Publications
|
Proceedings of the First Workshop on Interactive Learning for Natural Language Processing
Kianté Brantley,
Soham Dan,
Iryna Gurevych,
Ji-Ung Lee,
Filip Radlinski,
Hinrich Schütze,
Edwin Simpson,
Lili Yu,
Association for Computational Linguistics, 2021
[abstract]
Motivation: A key aspect of human learning is the ability to learn continuously from various sources of feedback. In contrast, much of the recent success of deep learning for NLP relies on large datasets and extensive compute resources to train and fine-tune models, which then remain fixed. This leaves a research gap for systems that adapt to the changing needs of individual users or allow users to continually correct errors as they emerge. Learning from user interaction is crucial for tasks that require a high grade of personalization and for rapidly changing or complex, multi-step tasks where collecting and annotating large datasets is not feasible, but an informed user can provide guidance. What is interactive NLP?: Interactive Learning for NLP means training, fine-tuning or otherwise adapting an NLP model to inputs from a human user or teacher. Relevant approaches range from active learning with a human in the loop, to training with implicit user feedback (eg clicks), dialogue systems that adapt to user utterances and training with new forms of human input. Interactive learning is the converse of learning from datasets collected offline with no human input during the training process
|
Successor Feature Sets: Generalizing Successor Representations Across Policies
Kianté Brantley,
Soroush Mehri,
Geoffrey J. Gordon
Association for the Advancement of Artificial Intelligence, 2021
[abstract]
[poster]
[slides]
Successor-style representations have many advantages for re- inforcement learning: for example, they can help an agent generalize from past experience to new goals, and they have been proposed as explanations of behavioral and neural data from human and animal learners. They also form a natu- ral bridge between model-based and model-free RL meth- ods: like the former they make predictions about future ex- periences, and like the latter they allow efficient prediction of total discounted rewards. However, successor-style rep- resentations are not optimized to generalize across policies: typically, we maintain a limited-length list of policies, and share information among them by representation learning or GPI. Successor-style representations also typically make no provision for gathering information or reasoning about la- tent variables. To address these limitations, we bring together ideas from predictive state representations, belief space value iteration, and convex analysis: we develop a new, general successor-style representation, together with a Bellman equa- tion that connects multiple sources of information within this representation, including different latent states, observations, policies, and reward functions. The new representation is highly expressive: for example, it lets us efficiently read off an optimal policy for a new reward function, or a policy that imitates a demonstration. For this paper, we focus on exact computation of the new representation in small, known en- vironments, since even this restricted setting offers plenty of interesting questions. Our implementation does not scale to large, unknown environments — nor would we expect it to, since it generalizes POMDP value iteration, which is difficult to scale. However, we believe that future work will allow us to extend our ideas to approximate reasoning in large, unknown environments. We conduct experiments to explore which of the potential barriers to scaling are most pressing.
|
Constrained episodic reinforcement learning in concave-convex and knapsack settings
Kianté Brantley,
Miroslav Dudik,
Thodoris Lykouris,
Sobhan Miryoosefi,
Max Simchowitz,
Aleksandrs Slivkins,
Wen Sun
Conference on Neural Information Processing Systems (NeurIPS), 2020
[abstract]
[code]
[poster]
We propose an algorithm for tabular episodic reinforcement learning with constraints. We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints, and for settings with hard constraints (knapsacks). Most of the previous work in constrained reinforcement learning is limited to linear constraints, and the remaining work focuses on either the feasibility question or settings with a single episode. Our experiments demonstrate that the proposed algorithm significantly outperforms these approaches in existing constrained episodic environments.
|
Active Imitation Learning with Noisy Guidance
Kianté Brantley,
Amr Sharaf,
Hal Daumé III
Association for Computational Linguistics (ACL), 2020
[abstract]
[code]
[poster]
[slides]
[video]
Imitation learning algorithms provide state-of-the-art results on many structured prediction tasks by learning near-optimal search policies. Such algorithms assume training-time access to an expert that can provide the optimal action at any queried state; unfortunately, the number of such queries is often prohibitive, frequently rendering these approaches impractical. To combat this query complexity, we consider an active learning setting in which the learning algorithm has additional access to a much cheaper noisy heuristic that provides noisy guidance. Our algorithm, LEAQI, learns a difference classifier that predicts when the expert is likely to disagree with the heuristic, and queries the expert only when necessary. We apply LEAQI to three sequence labeling tasks, demonstrating significantly fewer queries to the expert and comparable (or better) accuracies over a passive approach.
|
Disagreement-Regularized Imitation Learning
Kianté Brantley,
Wen Sun,
Mikael Henaff
International Conference on Learning Representations (ICLR), 2020 (Spotlight)
[abstract]
[code]
[poster]
[slides]
[video]
We present a simple and effective algorithm designed to address the covariate shift problem in imitation learning. It operates by training an ensemble of policies on the expert demonstration data, and using the variance of their predictions as a cost which is minimized with RL together with a supervised behavioral cloning cost. Unlike adversarial imitation methods, it uses a fixed reward function which is easy to optimize. We prove a regret bound for the algorithm which is linear in the time horizon multiplied by a coefficient which we show to be low for certain problems on which behavioral cloning fails. We evaluate our algorithm empirically across multiple pixel-based Atari environments and continuous control tasks, and show that it matches or significantly outperforms behavioral cloning and generative adversarial imitation learning
|
Non-monotonic sequential text generation
Sean Welleck,
Kianté Brantley,
Hal Daumé III,
Kyunghyun Cho
International Conference on Machine Learning (ICML), 2019
[abstract]
[code]
[poster]
[slides]
[video]
Standard sequential generation methods assume a pre-specified generation order, such as text gener ation methods which generate words from left to right. In this work, we propose a framework for training models of text generation that operate in non-monotonic orders; the model directly learns good orders, without any additional annotation. Our framework operates by generating a word at an arbitrary position, and then recursively generating words to its left and then words to its right, yielding a binary tree. Learning is framed as imitation learning, including a coaching method which moves from imitating an oracle to reinforcing the policy’s own preferences. Experimental results demonstrate that using the proposed method, it is possible to learn policies which generate text without pre-specifying a generation order, while achieving competitive performance with conventional left-to-right generation.
|
Reinforcement Learning with Convex Constraints
Sobhan Miryoosefi*,
Kianté Brantley*,
Hal Daumé III,
Miro Dudik,
Robert Schapire
Conference on Neural Information Processing Systems (NeurIPS), 2019
[abstract]
[code]
[poster]
[slides]
In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we propose an algorithmic scheme that can handle a wide class of constraints in RL tasks, specifically, any constraints that require expected values of some vector measurements (such as the use of an action) to lie in a convex set. This captures previously studied constraints (such as safety and proximity to an expert), but also enables new classes of constraints (such as diversity). Our approach comes with rigorous theoretical guarantees and only relies on the ability to approximately solve standard RL tasks. As a result, it can be easily adapted to work with any model-free or model-based RL algorithm. In our experiments, we show that it matches previous algorithms that enforce safety via constraints, but can also enforce new properties that these algorithms cannot incorporate, such as diversity.
|
The umd neural machine translation systems at wmt17 bandit learning task
Amr Sharaf,
Shi Feng,
Khanh Nguyen,
Kianté Brantley,
Hal Daumé III
Second Conference on Machine Translation, 2017
[abstract]
[poster]
We describe the University of Maryland machine translation systems submitted to the WMT17 German-English Bandit Learning Task. The task is to adapt a translation system to a new domain, using only bandit feedback: the system receives a German sentence to translate, produces an English sentence, and only gets a scalar score as feedback. Targeting these two challenges (adaptation and bandit learning), we built a standard neural machine translation system and extended it in two ways: (1) robust reinforcement learning techniques to learn effectively from the bandit feedback, and (2) domain adaptation using data selection from a large corpus of parallel data.
|
BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting
Kianté Brantley
University of Maryland, Baltimore County Master Thesis, 2016
[abstract]
[slides]
Determining the optimal size of a neural network is complicated. Neural networks, with many free parameters, can be used to solve very complex problems. However, these neural networks are susceptible to overfitting. BCAP (Brantley-Clark Artificial Neural Network Pruning Technique) addresses overfitting by combining duplicate neurons in a neural network hidden layer, thereby forcing the network to learn more distinct features. We compare hidden units using the cosine similarity, and combine those that are similar with each other within a threshold ϵ. By doing so the co-adaption of the neurons in the network is reduced because hidden units that are highly correlated (ie similar) are combined. In this paper we show evidence that BCAP is successful in reducing network size while maintaining accuracy, or improving accuracy of neural networks during and after training.
|
LDAexplore: Visualizing topic models generated using latent dirichlet allocation
Ashwinkumar Ganesan,
Kianté Brantley,
Shimei Pan,
Jian Chen
extvis Workshop - Intelligent User Interfaces (IUI), 2015
[abstract]
[code]
[slides]
We present LDAExplore, a tool to visualize topic distributions in a given document corpus that are generated using Topic Modeling methods. Latent Dirichlet Allocation (LDA) is one of the basic methods that is predominantly used to generate topics. One of the problems with methods like LDA is that users who apply them may not understand the topics that are generated. Also, users may find it difficult to search correlated topics and correlated documents. LDAExplore, tries to alleviate these problems by visualizing topic and word distributions generated from the document corpus and allowing the user to interact with them. The system is designed for users, who have minimal knowledge of LDA or Topic Modelling methods. To evaluate our design, we run a pilot study which uses the abstracts of 322 Information Visualization papers, where every abstract is considered a document. The topics generated are then explored by users. The results show that users are able to find correlated documents and group them based on topics that are similar.
|
|