Search Results for author: Karl Pertsch

Found 29 papers, 7 papers with code

FAST: Efficient Action Tokenization for Vision-Language-Action Models

no code implementations16 Jan 2025 Karl Pertsch, Kyle Stachowicz, Brian Ichter, Danny Driess, Suraj Nair, Quan Vuong, Oier Mees, Chelsea Finn, Sergey Levine

However, such models require us to choose a tokenization of our continuous action signals, which determines how the discrete symbols predicted by the model map to continuous robot actions.

Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning

1 code implementation26 Aug 2024 Joey Hejna, Chethan Bhateja, Yichen Jian, Karl Pertsch, Dorsa Sadigh

Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics.

Imitation Learning Robot Manipulation

Affordance-Guided Reinforcement Learning via Visual Prompting

no code implementations14 Jul 2024 Olivia Y. Lee, Annie Xie, Kuan Fang, Karl Pertsch, Chelsea Finn

On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps.

reinforcement-learning Reinforcement Learning +3

Robotic Control via Embodied Chain-of-Thought Reasoning

no code implementations11 Jul 2024 Michał Zawalski, William Chen, Karl Pertsch, Oier Mees, Chelsea Finn, Sergey Levine

Recent works on vision-language-action models (VLAs) have shown that the use of large, internet pre-trained vision-language models as the backbone of learned robot policies can substantially improve their robustness and generalization ability.

OpenVLA: An Open-Source Vision-Language-Action Model

2 code implementations13 Jun 2024 Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn

Large policies pretrained on a combination of Internet-scale vision-language data and diverse robot demonstrations have the potential to change how we teach robots new skills: rather than training new behaviors from scratch, we can fine-tune such vision-language-action (VLA) models to obtain robust, generalizable policies for visuomotor control.

Imitation Learning Language Modelling +2

Octo: An Open-Source Generalist Robot Policy

no code implementations20 May 2024 Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine

In experiments across 9 robotic platforms, we demonstrate that Octo serves as a versatile policy initialization that can be effectively finetuned to new observation and action spaces.

Robot Manipulation

Yell At Your Robot: Improving On-the-Fly from Language Corrections

no code implementations19 Mar 2024 Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, Chelsea Finn

In this paper, we make the following observation: high-level policies that index into sufficiently rich and expressive low-level language-conditioned skills can be readily supervised with human feedback in the form of language corrections.

LiFT: Unsupervised Reinforcement Learning with Foundation Models as Teachers

no code implementations14 Dec 2023 Taewook Nam, Juyong Lee, Jesse Zhang, Sung Ju Hwang, Joseph J. Lim, Karl Pertsch

We propose a framework that leverages foundation models as teachers, guiding a reinforcement learning agent to acquire semantically meaningful behavior without human feedback.

Language Modeling Language Modelling +3

Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance

no code implementations16 Oct 2023 Jesse Zhang, Jiahui Zhang, Karl Pertsch, Ziyi Liu, Xiang Ren, Minsuk Chang, Shao-Hua Sun, Joseph J. Lim

Instead, our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing "skill bootstrapping," where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set.

Language Modeling Language Modelling +1

SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

no code implementations20 Jun 2023 Jesse Zhang, Karl Pertsch, Jiahui Zhang, Joseph J. Lim

Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks.

Cross-Domain Transfer via Semantic Skill Imitation

no code implementations14 Dec 2022 Karl Pertsch, Ruta Desai, Vikash Kumar, Franziska Meier, Joseph J. Lim, Dhruv Batra, Akshara Rai

We propose an approach for semantic imitation, which uses demonstrations from a source domain, e. g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e. g. a robotic manipulator in a simulated kitchen.

Reinforcement Learning (RL) Robot Manipulation

PATO: Policy Assisted TeleOperation for Scalable Robot Data Collection

no code implementations9 Dec 2022 Shivin Dass, Karl Pertsch, Hejia Zhang, Youngwoon Lee, Joseph J. Lim, Stefanos Nikolaidis

Large-scale data is an essential component of machine learning as demonstrated in recent advances in natural language processing and computer vision research.

Task-Induced Representation Learning

no code implementations ICLR 2022 Jun Yamada, Karl Pertsch, Anisha Gunjal, Joseph J. Lim

We investigate the effectiveness of unsupervised and task-induced representation learning approaches on four visually complex environments, from Distracting DMControl to the CARLA driving simulator.

Contrastive Learning Imitation Learning +2

Skill-based Meta-Reinforcement Learning

no code implementations ICLR 2022 Taewook Nam, Shao-Hua Sun, Karl Pertsch, Sung Ju Hwang, Joseph J Lim

While deep reinforcement learning methods have shown impressive results in robot learning, their sample inefficiency makes the learning of complex, long-horizon behaviors with real robot systems infeasible.

continuous-control Continuous Control +5

Demonstration-Guided Reinforcement Learning with Learned Skills

no code implementations ICLR Workshop SSL-RL 2021 Karl Pertsch, Youngwoon Lee, Yue Wu, Joseph J. Lim

Prior approaches for demonstration-guided RL treat every new task as an independent learning problem and attempt to follow the provided demonstrations step-by-step, akin to a human trying to imitate a completely unseen behavior by following the demonstrator's exact muscle movements.

reinforcement-learning Reinforcement Learning +2

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments

no code implementations22 Oct 2020 Jun Yamada, Youngwoon Lee, Gautam Salhotra, Karl Pertsch, Max Pflueger, Gaurav S. Sukhatme, Joseph J. Lim, Peter Englert

In contrast, motion planners use explicit models of the agent and environment to plan collision-free paths to faraway goals, but suffer from inaccurate models in tasks that require contacts with the environment.

Deep Reinforcement Learning reinforcement-learning +2

Accelerating Reinforcement Learning with Learned Skill Priors

2 code implementations22 Oct 2020 Karl Pertsch, Youngwoon Lee, Joseph J. Lim

We validate our approach, SPiRL (Skill-Prior RL), on complex navigation and robotic manipulation tasks and show that learned skill priors are essential for effective skill transfer from rich datasets.

reinforcement-learning Reinforcement Learning +1

Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors

1 code implementation NeurIPS 2020 Karl Pertsch, Oleh Rybkin, Frederik Ebert, Chelsea Finn, Dinesh Jayaraman, Sergey Levine

In this work we propose a framework for visual prediction and planning that is able to overcome both of these limitations.

Prediction

Keyframing the Future: Discovering Temporal Hierarchy with Keyframe-Inpainter Prediction

no code implementations25 Sep 2019 Karl Pertsch, Oleh Rybkin, Jingyun Yang, Konstantinos G. Derpanis, Kostas Daniilidis, Joseph J. Lim, Andrew Jaegle

To flexibly and efficiently reason about temporal sequences, abstract representations that compactly represent the important information in the sequence are needed.

Temporal Sequences

Goal-Conditioned Video Prediction

no code implementations25 Sep 2019 Oleh Rybkin, Karl Pertsch, Frederik Ebert, Dinesh Jayaraman, Chelsea Finn, Sergey Levine

Prior work on video generation largely focuses on prediction models that only observe frames from the beginning of the video.

Imitation Learning Prediction +2

Learning what you can do before doing anything

no code implementations ICLR 2019 Oleh Rybkin, Karl Pertsch, Konstantinos G. Derpanis, Kostas Daniilidis, Andrew Jaegle

We introduce a loss term that encourages the network to capture the composability of visual sequences and show that it leads to representations that disentangle the structure of actions.

Video Prediction

iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects

no code implementations5 Dec 2017 Omid Hosseini Jafari, Siva Karthik Mustikovela, Karl Pertsch, Eric Brachmann, Carsten Rother

We address the task of 6D pose estimation of known rigid objects from single input images in scenarios where the objects are partly occluded.

6D Pose Estimation 6D Pose Estimation using RGB +4

Cannot find the paper you are looking for? You can Submit a new open access paper.