Search Results for author: Joey Hong

Found 19 papers, 1 papers with code

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

1 code implementation30 Nov 2023 Marwa Abdulhai, Isadora White, Charlie Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine

Developing such algorithms requires tasks that can gauge progress on algorithm design, provide accessible and reproducible evaluations for multi-turn interactions, and cover a range of task properties and challenges in improving reinforcement learning algorithms.

reinforcement-learning Text Generation

Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

no code implementations9 Nov 2023 Joey Hong, Sergey Levine, Anca Dragan

LLMs trained with supervised fine-tuning or "single-step" RL, as with standard RLHF, might struggle which tasks that require such goal-directed behavior, since they are not trained to optimize for overall conversational outcomes after multiple turns of interaction.

Text Generation

Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

no code implementations31 Oct 2023 Joey Hong, Anca Dragan, Sergey Levine

Theoretically, we show that standard offline RL algorithms conditioned on observation histories suffer from poor sample complexity, in accordance with the above intuition.

Autonomous Navigation Offline RL +1

ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis

no code implementations26 Jul 2023 Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton

When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks.

Program Synthesis

Multi-Task Off-Policy Learning from Bandit Feedback

no code implementations9 Dec 2022 Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.

Learning-To-Rank Recommendation Systems

On the Sensitivity of Reward Inference to Misspecified Human Models

no code implementations9 Dec 2022 Joey Hong, Kush Bhatia, Anca Dragan

This begs the question: how accurate do these models need to be in order for the reward inference to be accurate?

Continuous Control

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

no code implementations8 Dec 2022 Joey Hong, Aviral Kumar, Sergey Levine

This approach can be implemented in practice by conditioning the Q-function from existing conservative algorithms on the confidence. We theoretically show that our learned value functions produce conservative estimates of the true value at any desired confidence.

Offline RL reinforcement-learning +1

When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?

no code implementations12 Apr 2022 Aviral Kumar, Joey Hong, Anikait Singh, Sergey Levine

To answer this question, we characterize the properties of environments that allow offline RL methods to perform better than BC methods, even when only provided with expert data.

Atari Games Imitation Learning +3

Compositional Generalization and Decomposition in Neural Program Synthesis

no code implementations7 Apr 2022 Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton

We first characterize several different axes along which program synthesis methods would be desired to generalize, e. g., length generalization, or the ability to combine known subroutines in new ways that do not occur in the training data.

Program Synthesis

Deep Hierarchy in Bandits

no code implementations3 Feb 2022 Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

We use this exact posterior to analyze the Bayes regret of HierTS in Gaussian bandits.

Thompson Sampling

Hierarchical Bayesian Bandits

no code implementations12 Nov 2021 Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh

We provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit.

Federated Learning Thompson Sampling

Should I Run Offline Reinforcement Learning or Behavioral Cloning?

no code implementations ICLR 2022 Aviral Kumar, Joey Hong, Anikait Singh, Sergey Levine

In this paper, our goal is to characterize environments and dataset compositions where offline RL leads to better performance than BC.

Atari Games Offline RL +3

Thompson Sampling with a Mixture Prior

no code implementations10 Jun 2021 Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier

We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution.

Decision Making Multi-Task Learning +3

Non-Stationary Latent Bandits

no code implementations1 Dec 2020 Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier

The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models.

Recommendation Systems Thompson Sampling

Latent Programmer: Discrete Latent Codes for Program Synthesis

no code implementations1 Dec 2020 Joey Hong, David Dohan, Rishabh Singh, Charles Sutton, Manzil Zaheer

The latent codes are learned using a self-supervised learning principle, in which first a discrete autoencoder is trained on the output sequences, and then the resulting latent codes are used as intermediate targets for the end-to-end sequence prediction task.

Document Summarization Program Synthesis +1

Latent Bandits Revisited

no code implementations NeurIPS 2020 Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed, Craig Boutilier

A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.

Recommendation Systems Thompson Sampling

Non-Stationary Off-Policy Optimization

no code implementations15 Jun 2020 Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed

This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment.

Multi-Armed Bandits

Ensemble Maximum Entropy Classification and Linear Regression for Author Age Prediction

no code implementations4 Oct 2016 Joey Hong, Chris Mattmann, Paul Ramirez

The evolution of the internet has created an abundance of unstructured data on the web, a significant part of which is textual.

Classification General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.