Search Results for author: Kianté Brantley

Found 17 papers, 8 papers with code

REBEL: Reinforcement Learning via Regressing Relative Rewards

no code implementations • 25 Apr 2024 • Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications including the fine-tuning of generative models.

Continuous Control Image Generation +3

Paper
Add Code

Dataset Reset Policy Optimization for RLHF

2 code implementations • 12 Apr 2024 • Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun

Motivated by the fact that offline preference dataset provides informative states (i. e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution.

Reinforcement Learning (RL)

159

Paper
Code

Adversarial Imitation Learning via Boosting

no code implementations • 12 Apr 2024 • Jonathan D. Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun

In the weighted replay buffer, the contribution of the data from older policies are properly discounted with the weight computed based on the boosting framework.

Imitation Learning

Paper
Add Code

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

1 code implementation • 25 Mar 2024 • Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun

To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration.

Instruction Following reinforcement-learning +2

Paper
Code

A Surprising Failure? Multimodal LLMs and the NLVR Challenge

no code implementations • 26 Feb 2024 • Anne Wu, Kianté Brantley, Yoav Artzi

This study evaluates three state-of-the-art MLLMs -- GPT-4V, Gemini Pro, and the open-source model IDEFICS -- on the compositional natural language vision reasoning task NLVR.

Sentence

Paper
Add Code

Reviewer2: Optimizing Review Generation Through Prompt Generation

no code implementations • 16 Feb 2024 • Zhaolin Gao, Kianté Brantley, Thorsten Joachims

In this paper, we envision a use case where authors can receive LLM-generated reviews that uncover weak points in the current draft.

Review Generation

Paper
Add Code

Policy-Gradient Training of Language Models for Ranking

no code implementations • 6 Oct 2023 • Ge Gao, Jonathan D. Chang, Claire Cardie, Kianté Brantley, Thorsten Joachim

Text retrieval plays a crucial role in incorporating factual knowledge for decision making into language processing pipelines, ranging from chat-based web search to question answering systems.

Decision Making Domain Generalization +3

Paper
Add Code

Ranking with Long-Term Constraints

1 code implementation • 10 Jul 2023 • Kianté Brantley, Zhichong Fang, Sarah Dean, Thorsten Joachims

The feedback that users provide through their choices (e. g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms.

Fairness

Paper
Code

Interactive Text Generation

no code implementations • 2 Mar 2023 • Felix Faltings, Michel Galley, Baolin Peng, Kianté Brantley, Weixin Cai, Yizhe Zhang, Jianfeng Gao, Bill Dolan

Unfortunately, this means most of the research on text, code, and image generation has focused on non-interactive settings, whereby the model is expected to get everything right without accounting for any input from a user who may be willing to help.

Image Generation Imitation Learning +1

Paper
Add Code

lilGym: Natural Language Visual Reasoning with Reinforcement Learning

no code implementations • 3 Nov 2022 • Anne Wu, Kianté Brantley, Noriyuki Kojima, Yoav Artzi

We present lilGym, a new benchmark for language-conditioned reinforcement learning in visual environments.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

3 code implementations • 3 Oct 2022 • Rajkumar Ramamurthy, Prithviraj Ammanabrolu, Kianté Brantley, Jack Hessel, Rafet Sifa, Christian Bauckhage, Hannaneh Hajishirzi, Yejin Choi

To help answer this, we first introduce an open-source modular library, RL4LMs (Reinforcement Learning for Language Models), for optimizing language generators with RL.

Decision Making Policy Gradient Methods +3

2,090

Paper
Code

Successor Feature Sets: Generalizing Successor Representations Across Policies

no code implementations • 3 Mar 2021 • Kianté Brantley, Soroush Mehri, Geoffrey J. Gordon

They also form a natural bridge between model-based and model-free RL methods: like the former they make predictions about future experiences, and like the latter they allow efficient prediction of total discounted rewards.

Representation Learning

Paper
Add Code

Constrained episodic reinforcement learning in concave-convex and knapsack settings

1 code implementation • NeurIPS 2020 • Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

We propose an algorithm for tabular episodic reinforcement learning with constraints.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Active Imitation Learning with Noisy Guidance

1 code implementation • ACL 2020 • Kianté Brantley, Amr Sharaf, Hal Daumé III

Imitation learning algorithms provide state-of-the-art results on many structured prediction tasks by learning near-optimal search policies.

Active Learning Imitation Learning +1

Paper
Code

Reinforcement Learning with Convex Constraints

1 code implementation • NeurIPS 2019 • Sobhan Miryoosefi, Kianté Brantley, Hal Daumé III, Miroslav Dudik, Robert Schapire

In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Non-Monotonic Sequential Text Generation

1 code implementation • WS 2019 • Sean Welleck, Kianté Brantley, Hal Daumé III, Kyunghyun Cho

Standard sequential generation methods assume a pre-specified generation order, such as text generation methods which generate words from left to right.

Imitation Learning Position +1

Paper
Code

The UMD Neural Machine Translation Systems at WMT17 Bandit Learning Task

no code implementations • WS 2017 • Amr Sharaf, Shi Feng, Khanh Nguyen, Kianté Brantley, Hal Daumé III

We describe the University of Maryland machine translation systems submitted to the WMT17 German-English Bandit Learning Task.

Domain Adaptation Machine Translation +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.