no code implementations • 25 Apr 2024 • Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun
While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications including the fine-tuning of generative models.
2 code implementations • 12 Apr 2024 • Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun
Motivated by the fact that offline preference dataset provides informative states (i. e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution.
no code implementations • 12 Apr 2024 • Jonathan D. Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun
In the weighted replay buffer, the contribution of the data from older policies are properly discounted with the weight computed based on the boosting framework.
1 code implementation • 25 Mar 2024 • Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun
To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration.
no code implementations • 26 Feb 2024 • Anne Wu, Kianté Brantley, Yoav Artzi
This study evaluates three state-of-the-art MLLMs -- GPT-4V, Gemini Pro, and the open-source model IDEFICS -- on the compositional natural language vision reasoning task NLVR.
no code implementations • 16 Feb 2024 • Zhaolin Gao, Kianté Brantley, Thorsten Joachims
In this paper, we envision a use case where authors can receive LLM-generated reviews that uncover weak points in the current draft.
no code implementations • 6 Oct 2023 • Ge Gao, Jonathan D. Chang, Claire Cardie, Kianté Brantley, Thorsten Joachim
Text retrieval plays a crucial role in incorporating factual knowledge for decision making into language processing pipelines, ranging from chat-based web search to question answering systems.
1 code implementation • 10 Jul 2023 • Kianté Brantley, Zhichong Fang, Sarah Dean, Thorsten Joachims
The feedback that users provide through their choices (e. g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms.
no code implementations • 2 Mar 2023 • Felix Faltings, Michel Galley, Baolin Peng, Kianté Brantley, Weixin Cai, Yizhe Zhang, Jianfeng Gao, Bill Dolan
Unfortunately, this means most of the research on text, code, and image generation has focused on non-interactive settings, whereby the model is expected to get everything right without accounting for any input from a user who may be willing to help.
no code implementations • 3 Nov 2022 • Anne Wu, Kianté Brantley, Noriyuki Kojima, Yoav Artzi
We present lilGym, a new benchmark for language-conditioned reinforcement learning in visual environments.
3 code implementations • 3 Oct 2022 • Rajkumar Ramamurthy, Prithviraj Ammanabrolu, Kianté Brantley, Jack Hessel, Rafet Sifa, Christian Bauckhage, Hannaneh Hajishirzi, Yejin Choi
To help answer this, we first introduce an open-source modular library, RL4LMs (Reinforcement Learning for Language Models), for optimizing language generators with RL.
no code implementations • 3 Mar 2021 • Kianté Brantley, Soroush Mehri, Geoffrey J. Gordon
They also form a natural bridge between model-based and model-free RL methods: like the former they make predictions about future experiences, and like the latter they allow efficient prediction of total discounted rewards.
1 code implementation • NeurIPS 2020 • Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
We propose an algorithm for tabular episodic reinforcement learning with constraints.
1 code implementation • ACL 2020 • Kianté Brantley, Amr Sharaf, Hal Daumé III
Imitation learning algorithms provide state-of-the-art results on many structured prediction tasks by learning near-optimal search policies.
1 code implementation • NeurIPS 2019 • Sobhan Miryoosefi, Kianté Brantley, Hal Daumé III, Miroslav Dudik, Robert Schapire
In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward.
1 code implementation • WS 2019 • Sean Welleck, Kianté Brantley, Hal Daumé III, Kyunghyun Cho
Standard sequential generation methods assume a pre-specified generation order, such as text generation methods which generate words from left to right.
no code implementations • WS 2017 • Amr Sharaf, Shi Feng, Khanh Nguyen, Kianté Brantley, Hal Daumé III
We describe the University of Maryland machine translation systems submitted to the WMT17 German-English Bandit Learning Task.