Search Results for author: Joey Hejna

Found 6 papers, 4 papers with code

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

no code implementations18 Apr 2024 Rafael Rafailov, Joey Hejna, Ryan Park, Chelsea Finn

Standard RLHF deploys reinforcement learning in a specific token-level MDP, while DPO is derived as a bandit problem in which the whole response of the model is treated as a single arm.

Language Modelling Q-Learning +1

Contrastive Preference Learning: Learning from Human Feedback without RL

1 code implementation20 Oct 2023 Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

Thus, learning a reward function from feedback is not only based on a flawed assumption of human preference, but also leads to unwieldy optimization challenges that stem from policy gradients or bootstrapping in the RL phase.

reinforcement-learning Reinforcement Learning (RL)

Improving Long-Horizon Imitation Through Instruction Prediction

1 code implementation21 Jun 2023 Joey Hejna, Pieter Abbeel, Lerrel Pinto

Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents.

Distance Weighted Supervised Learning for Offline Interaction Data

1 code implementation26 Apr 2023 Joey Hejna, Jensen Gao, Dorsa Sadigh

To bridge the gap between IL and RL, we introduce Distance Weighted Supervised Learning or DWSL, a supervised method for learning goal-conditioned policies from offline data.

Imitation Learning Reinforcement Learning (RL)

Extreme Q-Learning: MaxEnt RL without Entropy

3 code implementations5 Jan 2023 Divyansh Garg, Joey Hejna, Matthieu Geist, Stefano Ermon

Using EVT, we derive our \emph{Extreme Q-Learning} framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms, that do not explicitly require access to a policy or its entropy.

D4RL Offline RL +2

Few-Shot Preference Learning for Human-in-the-Loop RL

no code implementations6 Dec 2022 Joey Hejna, Dorsa Sadigh

Contrary to most works that focus on query selection to \emph{minimize} the amount of data required for learning reward functions, we take an opposite approach: \emph{expanding} the pool of available data by viewing human-in-the-loop RL through the more flexible lens of multi-task learning.

Meta-Learning Multi-Task Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.