Search Results for author: Joey Hong

Found 19 papers, 1 papers with code

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

1 code implementation • 30 Nov 2023 • Marwa Abdulhai, Isadora White, Charlie Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine

Developing such algorithms requires tasks that can gauge progress on algorithm design, provide accessible and reproducible evaluations for multi-turn interactions, and cover a range of task properties and challenges in improving reinforcement learning algorithms.

reinforcement-learning Text Generation

Paper
Code

Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

no code implementations • 9 Nov 2023 • Joey Hong, Sergey Levine, Anca Dragan

LLMs trained with supervised fine-tuning or "single-step" RL, as with standard RLHF, might struggle which tasks that require such goal-directed behavior, since they are not trained to optimize for overall conversational outcomes after multiple turns of interaction.

Text Generation

Paper
Add Code

Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

no code implementations • 31 Oct 2023 • Joey Hong, Anca Dragan, Sergey Levine

Theoretically, we show that standard offline RL algorithms conditioned on observation histories suffer from poor sample complexity, in accordance with the above intuition.

Autonomous Navigation Offline RL +1

Paper
Add Code

ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis

no code implementations • 26 Jul 2023 • Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton

When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks.

Program Synthesis

Paper
Add Code

Multi-Task Off-Policy Learning from Bandit Feedback

no code implementations • 9 Dec 2022 • Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.

Learning-To-Rank Recommendation Systems

Paper
Add Code

On the Sensitivity of Reward Inference to Misspecified Human Models

no code implementations • 9 Dec 2022 • Joey Hong, Kush Bhatia, Anca Dragan

This begs the question: how accurate do these models need to be in order for the reward inference to be accurate?

Continuous Control

Paper
Add Code

Confidence-Conditioned Value Functions for Offline Reinforcement Learning

no code implementations • 8 Dec 2022 • Joey Hong, Aviral Kumar, Sergey Levine

This approach can be implemented in practice by conditioning the Q-function from existing conservative algorithms on the confidence. We theoretically show that our learned value functions produce conservative estimates of the true value at any desired confidence.

Offline RL reinforcement-learning +1

Paper
Add Code

When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?

no code implementations • 12 Apr 2022 • Aviral Kumar, Joey Hong, Anikait Singh, Sergey Levine

To answer this question, we characterize the properties of environments that allow offline RL methods to perform better than BC methods, even when only provided with expert data.

Atari Games Imitation Learning +3

Paper
Add Code

Compositional Generalization and Decomposition in Neural Program Synthesis

no code implementations • 7 Apr 2022 • Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton

We first characterize several different axes along which program synthesis methods would be desired to generalize, e. g., length generalization, or the ability to combine known subroutines in new ways that do not occur in the training data.

Program Synthesis

Paper
Add Code

Deep Hierarchy in Bandits

no code implementations • 3 Feb 2022 • Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

We use this exact posterior to analyze the Bayes regret of HierTS in Gaussian bandits.

Thompson Sampling

Paper
Add Code

Hierarchical Bayesian Bandits

no code implementations • 12 Nov 2021 • Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh

We provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit.

Federated Learning Thompson Sampling

Paper
Add Code

Should I Run Offline Reinforcement Learning or Behavioral Cloning?

no code implementations • ICLR 2022 • Aviral Kumar, Joey Hong, Anikait Singh, Sergey Levine

In this paper, our goal is to characterize environments and dataset compositions where offline RL leads to better performance than BC.

Atari Games Offline RL +3

Paper
Add Code

Thompson Sampling with a Mixture Prior

no code implementations • 10 Jun 2021 • Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier

We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution.

Decision Making Multi-Task Learning +3

Paper
Add Code

Non-Stationary Latent Bandits

no code implementations • 1 Dec 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier

The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models.

Recommendation Systems Thompson Sampling

Paper
Add Code

Latent Programmer: Discrete Latent Codes for Program Synthesis

no code implementations • 1 Dec 2020 • Joey Hong, David Dohan, Rishabh Singh, Charles Sutton, Manzil Zaheer

The latent codes are learned using a self-supervised learning principle, in which first a discrete autoencoder is trained on the output sequences, and then the resulting latent codes are used as intermediate targets for the end-to-end sequence prediction task.

Document Summarization Program Synthesis +1

Paper
Add Code

Latent Bandits Revisited

no code implementations • NeurIPS 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed, Craig Boutilier

A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.

Recommendation Systems Thompson Sampling

Paper
Add Code

Non-Stationary Off-Policy Optimization

no code implementations • 15 Jun 2020 • Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed

This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment.

Multi-Armed Bandits

Paper
Add Code

Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions

no code implementations • CVPR 2019 • Joey Hong, Benjamin Sapp, James Philbin

We focus on the problem of predicting future states of entities in complex, real-world driving scenarios.

Paper
Add Code

Ensemble Maximum Entropy Classification and Linear Regression for Author Age Prediction

no code implementations • 4 Oct 2016 • Joey Hong, Chris Mattmann, Paul Ramirez

The evolution of the internet has created an abundance of unstructured data on the web, a significant part of which is textual.

Classification General Classification +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.