Search Results for author: Joey Hejna

Found 14 papers, 6 papers with code

Efficiently Generating Expressive Quadruped Behaviors via Language-Guided Preference Learning

no code implementations6 Feb 2025 Jaden Clark, Joey Hejna, Dorsa Sadigh

Expressive robotic behavior is essential for the widespread acceptance of robots in social environments.

Vision Language Models are In-Context Value Learners

no code implementations7 Nov 2024 Yecheng Jason Ma, Joey Hejna, Ayzaan Wahid, Chuyuan Fu, Dhruv Shah, Jacky Liang, Zhuo Xu, Sean Kirmani, Peng Xu, Danny Driess, Ted Xiao, Jonathan Tompson, Osbert Bastani, Dinesh Jayaraman, Wenhao Yu, Tingnan Zhang, Dorsa Sadigh, Fei Xia

Instead, GVL poses value estimation as a temporal ordering problem over shuffled video frames; this seemingly more challenging task encourages VLMs to more fully exploit their underlying semantic and temporal grounding capabilities to differentiate frames based on their perceived task progress, consequently producing significantly better value predictions.

In-Context Learning World Knowledge

So You Think You Can Scale Up Autonomous Robot Data Collection?

no code implementations4 Nov 2024 Suvir Mirchandani, Suneel Belkhale, Joey Hejna, Evelyn Choi, Md Sazzad Islam, Dorsa Sadigh

Our work suggests a negative result: that scaling up autonomous data collection for learning robot policies for real-world tasks is more challenging and impractical than what is suggested in prior work.

Imitation Learning Reinforcement Learning (RL)

MotIF: Motion Instruction Fine-tuning

no code implementations16 Sep 2024 Minyoung Hwang, Joey Hejna, Dorsa Sadigh, Yonatan Bisk

MotIF assesses the success of robot motion given the image observation of the trajectory, task instruction, and motion description.

Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning

1 code implementation26 Aug 2024 Joey Hejna, Chethan Bhateja, Yichen Jian, Karl Pertsch, Dorsa Sadigh

Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics.

Imitation Learning Robot Manipulation

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

no code implementations5 Jun 2024 Rafael Rafailov, Yaswanth Chittepu, Ryan Park, Harshit Sikchi, Joey Hejna, Bradley Knox, Chelsea Finn, Scott Niekum

Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process.

Reinforcement Learning (RL)

Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

1 code implementation2 Jun 2024 Omar Shaikh, Michelle Lam, Joey Hejna, Yijia Shao, Michael Bernstein, Diyi Yang

Across our benchmarks and user study, we find that win-rates for DITTO outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an average of 19% points.

Imitation Learning Language Modeling +1

Octo: An Open-Source Generalist Robot Policy

no code implementations20 May 2024 Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine

In experiments across 9 robotic platforms, we demonstrate that Octo serves as a versatile policy initialization that can be effectively finetuned to new observation and action spaces.

Robot Manipulation

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

no code implementations18 Apr 2024 Rafael Rafailov, Joey Hejna, Ryan Park, Chelsea Finn

Standard RLHF deploys reinforcement learning in a specific token-level MDP, while DPO is derived as a bandit problem in which the whole response of the model is treated as a single arm.

Language Modeling Language Modelling +2

Contrastive Preference Learning: Learning from Human Feedback without RL

1 code implementation20 Oct 2023 Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

Thus, learning a reward function from feedback is not only based on a flawed assumption of human preference, but also leads to unwieldy optimization challenges that stem from policy gradients or bootstrapping in the RL phase.

reinforcement-learning Reinforcement Learning (RL)

Improving Long-Horizon Imitation Through Instruction Prediction

1 code implementation21 Jun 2023 Joey Hejna, Pieter Abbeel, Lerrel Pinto

Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents.

Prediction

Distance Weighted Supervised Learning for Offline Interaction Data

1 code implementation26 Apr 2023 Joey Hejna, Jensen Gao, Dorsa Sadigh

To bridge the gap between IL and RL, we introduce Distance Weighted Supervised Learning or DWSL, a supervised method for learning goal-conditioned policies from offline data.

Imitation Learning Reinforcement Learning (RL) +1

Extreme Q-Learning: MaxEnt RL without Entropy

4 code implementations5 Jan 2023 Divyansh Garg, Joey Hejna, Matthieu Geist, Stefano Ermon

Using EVT, we derive our \emph{Extreme Q-Learning} framework and consequently online and, for the first time, offline MaxEnt Q-learning algorithms, that do not explicitly require access to a policy or its entropy.

D4RL Deep Reinforcement Learning +3

Few-Shot Preference Learning for Human-in-the-Loop RL

no code implementations6 Dec 2022 Joey Hejna, Dorsa Sadigh

Contrary to most works that focus on query selection to \emph{minimize} the amount of data required for learning reward functions, we take an opposite approach: \emph{expanding} the pool of available data by viewing human-in-the-loop RL through the more flexible lens of multi-task learning.

Meta-Learning Multi-Task Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.