Search Results for author: Kianté Brantley

Found 17 papers, 8 papers with code

REBEL: Reinforcement Learning via Regressing Relative Rewards

no code implementations25 Apr 2024 Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications including the fine-tuning of generative models.

Continuous Control Image Generation +3

Dataset Reset Policy Optimization for RLHF

2 code implementations12 Apr 2024 Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun

Motivated by the fact that offline preference dataset provides informative states (i. e., data that is preferred by the labelers), our new algorithm, Dataset Reset Policy Optimization (DR-PO), integrates the existing offline preference dataset into the online policy training procedure via dataset reset: it directly resets the policy optimizer to the states in the offline dataset, instead of always starting from the initial state distribution.

Reinforcement Learning (RL)

Adversarial Imitation Learning via Boosting

no code implementations12 Apr 2024 Jonathan D. Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun

In the weighted replay buffer, the contribution of the data from older policies are properly discounted with the weight computed based on the boosting framework.

Imitation Learning

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

1 code implementation25 Mar 2024 Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun

To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration.

Instruction Following reinforcement-learning +2

A Surprising Failure? Multimodal LLMs and the NLVR Challenge

no code implementations26 Feb 2024 Anne Wu, Kianté Brantley, Yoav Artzi

This study evaluates three state-of-the-art MLLMs -- GPT-4V, Gemini Pro, and the open-source model IDEFICS -- on the compositional natural language vision reasoning task NLVR.

Sentence

Reviewer2: Optimizing Review Generation Through Prompt Generation

no code implementations16 Feb 2024 Zhaolin Gao, Kianté Brantley, Thorsten Joachims

In this paper, we envision a use case where authors can receive LLM-generated reviews that uncover weak points in the current draft.

Review Generation

Policy-Gradient Training of Language Models for Ranking

no code implementations6 Oct 2023 Ge Gao, Jonathan D. Chang, Claire Cardie, Kianté Brantley, Thorsten Joachim

Text retrieval plays a crucial role in incorporating factual knowledge for decision making into language processing pipelines, ranging from chat-based web search to question answering systems.

Decision Making Domain Generalization +3

Ranking with Long-Term Constraints

1 code implementation10 Jul 2023 Kianté Brantley, Zhichong Fang, Sarah Dean, Thorsten Joachims

The feedback that users provide through their choices (e. g., clicks, purchases) is one of the most common types of data readily available for training search and recommendation algorithms.

Fairness

Interactive Text Generation

no code implementations2 Mar 2023 Felix Faltings, Michel Galley, Baolin Peng, Kianté Brantley, Weixin Cai, Yizhe Zhang, Jianfeng Gao, Bill Dolan

Unfortunately, this means most of the research on text, code, and image generation has focused on non-interactive settings, whereby the model is expected to get everything right without accounting for any input from a user who may be willing to help.

Image Generation Imitation Learning +1

Successor Feature Sets: Generalizing Successor Representations Across Policies

no code implementations3 Mar 2021 Kianté Brantley, Soroush Mehri, Geoffrey J. Gordon

They also form a natural bridge between model-based and model-free RL methods: like the former they make predictions about future experiences, and like the latter they allow efficient prediction of total discounted rewards.

Representation Learning

Active Imitation Learning with Noisy Guidance

1 code implementation ACL 2020 Kianté Brantley, Amr Sharaf, Hal Daumé III

Imitation learning algorithms provide state-of-the-art results on many structured prediction tasks by learning near-optimal search policies.

Active Learning Imitation Learning +1

Non-Monotonic Sequential Text Generation

1 code implementation WS 2019 Sean Welleck, Kianté Brantley, Hal Daumé III, Kyunghyun Cho

Standard sequential generation methods assume a pre-specified generation order, such as text generation methods which generate words from left to right.

Imitation Learning Position +1

Cannot find the paper you are looking for? You can Submit a new open access paper.