Search Results for author: Pieter Abbeel

Found 380 papers, 198 papers with code

Responsive Safety in Reinforcement Learning

no code implementations ICML 2020 Adam Stooke, Joshua Achiam, Pieter Abbeel

This intuition leads to our introduction of PID control for the Lagrange multiplier in constrained RL, which we cast as a dynamical system.

reinforcement-learning Reinforcement Learning +2

Hierarchically Decoupled Morphological Transfer

no code implementations ICML 2020 Donald Hejna, Lerrel Pinto, Pieter Abbeel

Learning long-range behaviors on complex high-dimensional agents is a fundamental problem in robot learning.

CURL: Contrastive Unsupervised Representation Learning for Reinforcement Learning

1 code implementation ICML 2020 Michael Laskin, Pieter Abbeel, Aravind Srinivas

CURL extracts high level features from raw pixels using a contrastive learning objective and performs off-policy control on top of the extracted features.

Contrastive Learning reinforcement-learning +3

Object-centric 3D Motion Field for Robot Learning from Human Videos

no code implementations4 Jun 2025 Zhao-Heng Yin, Sherry Yang, Pieter Abbeel

However, how to extract action knowledge (or action representations) from videos for policy learning remains a key challenge.

Denoising Motion Estimation

Feel the Force: Contact-Driven Learning from Humans

no code implementations2 Jun 2025 Ademi Adeniji, Zhuoran Chen, Vincent Liu, Venkatesh Pattabiraman, Raunaq Bhirangi, Siddhant Haldar, Pieter Abbeel, Lerrel Pinto

Using a tactile glove to measure contact forces and a vision-based model to estimate hand pose, we train a closed-loop policy that continuously predicts the forces needed for manipulation.

Diffusion Guidance Is a Controllable Policy Improvement Operator

1 code implementation29 May 2025 Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine

The resulting framework, CFGRL, is trained with the simplicity of supervised learning, yet can further improve on the policies in the data.

Offline RL

FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control

no code implementations28 May 2025 Younggyo Seo, Carmelo Sferrazza, Haoran Geng, Michal Nauman, Zhao-Heng Yin, Pieter Abbeel

Reinforcement learning (RL) has driven significant progress in robotics, but its complexity and long training times remain major bottlenecks.

Humanoid Control MuJoCo +1

EgoZero: Robot Learning from Smart Glasses

no code implementations26 May 2025 Vincent Liu, Ademi Adeniji, Haotian Zhan, Siddhant Haldar, Raunaq Bhirangi, Pieter Abbeel, Lerrel Pinto

Despite recent progress in general purpose robotics, robot policies still lag far behind basic human capabilities in the real world.

DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy

1 code implementation16 May 2025 Yuran Wang, Ruihai Wu, Yue Chen, Jiarui Wang, Jiaqi Liang, Ziyu Zhu, Haoran Geng, Jitendra Malik, Pieter Abbeel, Hao Dong

To improve generalization across diverse garment shapes and deformations, we propose a Hierarchical gArment-manipuLation pOlicy (HALO).

Reinforcement Learning (RL)

Geometric Retargeting: A Principled, Ultrafast Neural Hand Retargeting Algorithm

no code implementations10 Mar 2025 Zhao-Heng Yin, Changhao Wang, Luis Pineda, Krishna Bodduluri, Tingfan Wu, Pieter Abbeel, Mustafa Mukadam

We introduce Geometric Retargeting (GeoRT), an ultrafast, and principled neural hand retargeting algorithm for teleoperation, developed as part of our recent Dexterity Gen (DexGen) system.

Video2Policy: Scaling up Manipulation Tasks in Simulation through Internet Videos

no code implementations14 Feb 2025 Weirui Ye, Fangchen Liu, Zheng Ding, Yang Gao, Oleh Rybkin, Pieter Abbeel

Finally, we show that the generated simulation data can be scaled up for training a general policy, and it can be transferred back to the real robot in a Real2Sim2Real way.

Value-Based Deep RL Scales Predictably

no code implementations6 Feb 2025 Oleh Rybkin, Michal Nauman, Preston Fu, Charlie Snell, Pieter Abbeel, Sergey Levine, Aviral Kumar

Third, this scaling behavior is enabled by first estimating predictable relationships between hyperparameters, which is used to manage effects of overfitting and plasticity loss unique to RL.

OpenAI Gym

Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding

no code implementations8 Jan 2025 Joshua Jones, Oier Mees, Carmelo Sferrazza, Kyle Stachowicz, Pieter Abbeel, Sergey Levine

However, state-of-the-art generalist robot policies are typically trained on large datasets to predict robot actions solely from visual and proprioceptive observations.

Robot Manipulation Text Generation +1

Efficient Long Video Tokenization via Coordinate-based Patch Reconstruction

no code implementations CVPR 2025 Huiwon Jang, Sihyun Yu, Jinwoo Shin, Pieter Abbeel, Younggyo Seo

Our experiments show that CoordTok can drastically reduce the number of tokens for encoding long video clips.

Coarse-to-fine Q-Network with Action Sequence for Data-Efficient Robot Learning

no code implementations19 Nov 2024 Younggyo Seo, Pieter Abbeel

Motivated by this, we introduce Coarse-to-fine Q-Network with Action Sequence (CQN-AS), a novel value-based RL algorithm that learns a critic network that outputs Q-values over a sequence of actions, i. e., explicitly training the value function to learn the consequence of executing action sequences.

Humanoid Control reinforcement-learning +2

Prioritized Generative Replay

no code implementations23 Oct 2024 Renhao Wang, Kevin Frans, Pieter Abbeel, Sergey Levine, Alexei A. Efros

In this work, we instead propose a prioritized, parametric version of an agent's memory, using generative models to capture online experience.

Cliqueformer: Model-Based Optimization with Structured Transformers

1 code implementation17 Oct 2024 Jakub Grudzien Kuba, Pieter Abbeel, Sergey Levine

Large neural networks excel at prediction tasks, but their application to design problems, such as protein engineering or materials discovery, requires solving offline model-based optimization (MBO) problems.

model

One Step Diffusion via Shortcut Models

2 code implementations16 Oct 2024 Kevin Frans, Danijar Hafner, Sergey Levine, Pieter Abbeel

We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples in a single or multiple sampling steps.

Denoising Scheduling

ElasticTok: Adaptive Tokenization for Image and Video

no code implementations10 Oct 2024 Wilson Yan, Volodymyr Mnih, Aleksandra Faust, Matei Zaharia, Pieter Abbeel, Hao liu

Efficient video tokenization remains a key bottleneck in learning general purpose vision models that are capable of processing long video sequences.

Body Transformer: Leveraging Robot Embodiment for Policy Learning

no code implementations12 Aug 2024 Carmelo Sferrazza, Dun-Ming Huang, Fangchen Liu, Jongmin Lee, Pieter Abbeel

In recent years, the transformer architecture has become the de facto standard for machine learning algorithms applied to natural language processing and computer vision.

Computational Efficiency Inductive Bias

Semi-Supervised One-Shot Imitation Learning

no code implementations9 Aug 2024 Philipp Wu, Kourosh Hakhamaneshi, Yuqing Du, Igor Mordatch, Aravind Rajeswaran, Pieter Abbeel

We utilize this embedding space and the clustering it supports to self-generate pairings between trajectories in the large unpaired dataset.

Few-Shot Learning Imitation Learning

Offline Imitation Learning Through Graph Search and Retrieval

no code implementations22 Jul 2024 Zhao-Heng Yin, Pieter Abbeel

As a result, a robot has to learn skills from suboptimal demonstrations and unstructured interactions, which remains a key challenge.

Deep Reinforcement Learning Imitation Learning +2

Chip Placement with Diffusion Models

1 code implementation17 Jul 2024 Vint Lee, Minh Nguyen, Leena Elzeiny, Chun Deng, Pieter Abbeel, John Wawrzynek

Macro placement is a vital step in digital circuit design that defines the physical location of large collections of components, known as macros, on a 2D chip.

Dataset Generation Denoising +1

Visual Representation Learning with Stochastic Frame Prediction

no code implementations11 Jun 2024 Huiwon Jang, Dongyoung Kim, Junsu Kim, Jinwoo Shin, Pieter Abbeel, Younggyo Seo

To tackle this challenge, in this paper, we revisit the idea of stochastic video generation that learns to capture uncertainty in frame prediction and explore its effectiveness for representation learning.

Decoder Pose Tracking +6

From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

no code implementations8 May 2024 Yide Shentu, Philipp Wu, Aravind Rajeswaran, Pieter Abbeel

This enables LLMs to flexibly communicate goals in the task plan without being entirely constrained by language limitations.

Robot Manipulation

HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation

1 code implementation15 Mar 2024 Carmelo Sferrazza, Dun-Ming Huang, Xingyu Lin, Youngwoon Lee, Pieter Abbeel

Humanoid robots hold great promise in assisting humans in diverse environments and tasks, due to their flexibility and adaptability leveraging human-like morphology.

Closing the Visual Sim-to-Real Gap with Object-Composable NeRFs

1 code implementation7 Mar 2024 Nikhil Mishra, Maximilian Sieb, Pieter Abbeel, Xi Chen

Deep learning methods for perception are the cornerstone of many robotic systems.

NeRF

MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting

no code implementations5 Mar 2024 Fangchen Liu, Kuan Fang, Pieter Abbeel, Sergey Levine

Central to our approach is a compact point-based representation of affordance, which bridges the VLM's predictions on observed images and the robot's actions in the physical world.

In-Context Learning Object Rearrangement +3

Twisting Lids Off with Two Hands

no code implementations4 Mar 2024 Toru Lin, Zhao-Heng Yin, Haozhi Qi, Pieter Abbeel, Jitendra Malik

Manipulating objects with two multi-fingered hands has been a long-standing challenge in robotics, due to the contact-rich nature of many manipulation tasks and the complexity inherent in coordinating a high-dimensional bimanual system.

Deep Reinforcement Learning reinforcement-learning +1

Video as the New Language for Real-World Decision Making

no code implementations27 Feb 2024 Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

Moreover, we demonstrate how, like language models, video generation can serve as planners, agents, compute engines, and environment simulators through techniques such as in-context learning, planning and reinforcement learning.

Decision Making In-Context Learning +2

Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

1 code implementation27 Feb 2024 Kevin Frans, Seohong Park, Pieter Abbeel, Sergey Levine

Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner?

Diversity Offline RL +2

A StrongREJECT for Empty Jailbreaks

2 code implementations15 Feb 2024 Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer

To create a benchmark, researchers must choose a dataset of forbidden prompts to which a victim model will respond, along with an evaluation method that scores the harmfulness of the victim model's responses.

MMLU

World Model on Million-Length Video And Language With Blockwise RingAttention

1 code implementation13 Feb 2024 Hao liu, Wilson Yan, Matei Zaharia, Pieter Abbeel

To address these challenges, we curate a large dataset of diverse videos and books, utilize the Blockwise RingAttention technique to scalably train on long sequences, and gradually increase context size from 4K to 1M tokens.

4k Video Understanding

Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control

no code implementations30 Jan 2024 Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath

Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing.

Deep Reinforcement Learning reinforcement-learning +1

Functional Graphical Models: Structure Enables Offline Data-Driven Optimization

no code implementations8 Jan 2024 Jakub Grudzien Kuba, Masatoshi Uehara, Pieter Abbeel, Sergey Levine

This kind of data-driven optimization (DDO) presents a range of challenges beyond those in standard prediction problems, since we need models that successfully predict the performance of new designs that are better than the best designs seen in the training set.

Any-point Trajectory Modeling for Policy Learning

1 code implementation28 Dec 2023 Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, Pieter Abbeel

Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning.

Trajectory Modeling Transfer Learning

Learning a Diffusion Model Policy from Rewards via Q-Score Matching

1 code implementation18 Dec 2023 Michael Psenka, Alejandro Escontrela, Pieter Abbeel, Yi Ma

However, previous works fail to exploit the score-based structure of diffusion models, and instead utilize a simple behavior cloning term to train the actor, limiting their ability in the actor-critic setting.

Denoising reinforcement-learning +1

DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing

no code implementations2 Nov 2023 Vint Lee, Pieter Abbeel, Youngwoon Lee

Model-based reinforcement learning (MBRL) has gained much attention for its ability to learn complex behaviors in a sample-efficient way: planning actions by generating imaginary trajectories with predicted rewards.

Model-based Reinforcement Learning reinforcement-learning +1

The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning

no code implementations2 Nov 2023 Carmelo Sferrazza, Younggyo Seo, Hao liu, Youngwoon Lee, Pieter Abbeel

For tasks requiring object manipulation, we seamlessly and effectively exploit the complementarity of our senses of vision and touch.

Scalable Diffusion for Materials Generation

no code implementations18 Oct 2023 Sherry Yang, KwangHwan Cho, Amil Merchant, Pieter Abbeel, Dale Schuurmans, Igor Mordatch, Ekin Dogus Cubuk

Lastly, we show that conditional generation with UniMat can scale to previously established crystal datasets with up to millions of crystals structures, outperforming random structure search (the current leading method for structure discovery) in discovering new stable materials.

Formation Energy

Video Language Planning

no code implementations16 Oct 2023 Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson

We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data.

Object Rearrangement

Interactive Task Planning with Language Models

no code implementations16 Oct 2023 Boyi Li, Philipp Wu, Pieter Abbeel, Jitendra Malik

To tackle this, we propose a simple framework that achieves interactive task planning with language models by incorporating both high-level planning and low-level skill execution through function calling, leveraging pretrained vision models to ground the scene in language.

Language Modeling Language Modelling +3

Exploration with Principles for Diverse AI Supervision

no code implementations13 Oct 2023 Hao liu, Matei Zaharia, Pieter Abbeel

Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI.

Reinforcement Learning (RL) Unsupervised Reinforcement Learning

Learning Interactive Real-World Simulators

no code implementations9 Oct 2023 Sherry Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie Kaelbling, Dale Schuurmans, Pieter Abbeel

Applications of a real-world simulator range from controllable content creation in games and movies, to training embodied agents purely in simulation that can be directly deployed in the real world.

Video Captioning

Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

no code implementations4 Oct 2023 Weirui Ye, Yunsheng Zhang, Haoyang Weng, Xianfan Gu, Shengjie Wang, Tong Zhang, Mengchen Wang, Pieter Abbeel, Yang Gao

We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models.

Quantization reinforcement-learning +1

Ring Attention with Blockwise Transformers for Near-Infinite Context

8 code implementations3 Oct 2023 Hao liu, Matei Zaharia, Pieter Abbeel

Transformers have emerged as the architecture of choice for many state-of-the-art AI models, showcasing exceptional performance across a wide range of AI applications.

Language Modeling Language Modelling

Language-Conditioned Path Planning

1 code implementation31 Aug 2023 Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James

Contact is at the core of robotic manipulation.

Language Reward Modulation for Pretraining Reinforcement Learning

1 code implementation23 Aug 2023 Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel

Using learned reward functions (LRFs) as a means to solve sparse-reward reinforcement learning (RL) tasks has yielded some steady progress in task-complexity through the years.

reinforcement-learning Reinforcement Learning +2

Convolutional Occupancy Models for Dense Packing of Complex, Novel Objects

1 code implementation31 Jul 2023 Nikhil Mishra, Pieter Abbeel, Xi Chen, Maximilian Sieb

Dense packing in pick-and-place systems is an important feature in many warehouse and logistics applications.

Learning to Model the World with Language

no code implementations31 Jul 2023 Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, Anca Dragan

While current agents can learn to execute simple language instructions, we aim to build agents that leverage diverse language -- language like "this button turns on the TV" or "I put the bowls away" -- that conveys general knowledge, describes the state of the world, provides interactive feedback, and more.

Future prediction General Knowledge +2

SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks

no code implementations7 Jul 2023 Xingyu Lin, John So, Sashwat Mahalingam, Fangchen Liu, Pieter Abbeel

In this work, we present a focused study of the generalization capabilities of the pre-trained visual representations at the categorical level.

Imitation Learning

Improving Long-Horizon Imitation Through Instruction Prediction

1 code implementation21 Jun 2023 Joey Hejna, Pieter Abbeel, Lerrel Pinto

Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents.

Prediction

ALP: Action-Aware Embodied Learning for Perception

no code implementations16 Jun 2023 Xinran Liang, Anthony Han, Wilson Yan, aditi raghunathan, Pieter Abbeel

In addition, we show that by training on actively collected data more relevant to the environment and task, our method generalizes more robustly to downstream tasks compared to models pre-trained on fixed datasets such as ImageNet.

Benchmarking object-detection +3

Probabilistic Adaptation of Text-to-Video Models

no code implementations2 Jun 2023 Mengjiao Yang, Yilun Du, Bo Dai, Dale Schuurmans, Joshua B. Tenenbaum, Pieter Abbeel

Large text-to-video models trained on internet-scale data have demonstrated exceptional capabilities in generating high-fidelity videos from arbitrary textual descriptions.

Language Modelling Large Language Model

Train Offline, Test Online: A Real Robot Learning Benchmark

1 code implementation1 Jun 2023 Gaoyue Zhou, Victoria Dean, Mohan Kumar Srirama, Aravind Rajeswaran, Jyothish Pari, Kyle Hatch, Aryan Jain, Tianhe Yu, Pieter Abbeel, Lerrel Pinto, Chelsea Finn, Abhinav Gupta

Three challenges limit the progress of robot learning research: robots are expensive (few labs can participate), everyone uses different robots (findings do not generalize across labs), and we lack internet-scale robotics data.

Accelerating Reinforcement Learning with Value-Conditional State Entropy Exploration

no code implementations NeurIPS 2023 Dongyoung Kim, Jinwoo Shin, Pieter Abbeel, Younggyo Seo

A promising technique for exploration is to maximize the entropy of visited state distribution, i. e., state entropy, by encouraging uniform coverage of visited state space.

reinforcement-learning Reinforcement Learning

Blockwise Parallel Transformer for Large Context Models

3 code implementations30 May 2023 Hao liu, Pieter Abbeel

Transformers have emerged as the cornerstone of state-of-the-art natural language processing models, showcasing exceptional performance across a wide range of AI applications.

Language Modeling Language Modelling

Emergent Agentic Transformer from Chain of Hindsight Experience

no code implementations26 May 2023 Hao liu, Pieter Abbeel

Our method consists of relabelling target return of each trajectory to the maximum total reward among in sequence of trajectories and training an autoregressive model to predict actions conditioning on past states, actions, rewards, target returns, and task completion tokens, the resulting model, Agentic Transformer (AT), can learn to improve upon itself both at training and test time.

D4RL Imitation Learning +3

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

2 code implementations25 May 2023 Ying Fan, Olivia Watkins, Yuqing Du, Hao liu, MoonKyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, Kimin Lee

We focus on diffusion models, defining the fine-tuning task as an RL problem, and updating the pre-trained text-to-image diffusion models using policy gradient to maximize the feedback-trained reward.

reinforcement-learning Reinforcement Learning (RL)

The False Promise of Imitating Proprietary LLMs

1 code implementation25 May 2023 Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao liu, Pieter Abbeel, Sergey Levine, Dawn Song

This approach looks to cheaply imitate the proprietary model's capabilities using a weaker open-source model.

Language Modelling

Self-Supervised Instance Segmentation by Grasping

no code implementations10 May 2023 Yuxuan Liu, Xi Chen, Pieter Abbeel

Leveraging this insight, we learn a grasp segmentation model to segment the grasped object from before and after grasp images.

Instance Segmentation Robotic Grasping +2

Distributional Instance Segmentation: Modeling Uncertainty and High Confidence Predictions with Latent-MaskRCNN

no code implementations3 May 2023 Yuxuan Liu, Nikhil Mishra, Pieter Abbeel, Xi Chen

Existing state-of-the-art methods are often unable to capture meaningful uncertainty in challenging or ambiguous scenes, and as such can cause critical errors in high-performance applications.

Instance Segmentation Object Recognition +2

Foundation Models for Decision Making: Problems, Methods, and Opportunities

no code implementations7 Mar 2023 Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans

In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.

Autonomous Driving Decision Making +1

Preference Transformer: Modeling Human Preferences using Transformers for RL

1 code implementation2 Mar 2023 Changyeon Kim, Jongjin Park, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee

In this paper, we present Preference Transformer, a neural architecture that models human preferences using transformers.

Decision Making Reinforcement Learning (RL)

Robust and Versatile Bipedal Jumping Control through Reinforcement Learning

no code implementations19 Feb 2023 Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath

This work aims to push the limits of agility for bipedal robots by enabling a torque-controlled bipedal robot to perform robust and versatile dynamic jumps in the real world.

reinforcement-learning Reinforcement Learning +1

Controllability-Aware Unsupervised Skill Discovery

3 code implementations10 Feb 2023 Seohong Park, Kimin Lee, Youngwoon Lee, Pieter Abbeel

One of the key capabilities of intelligent agents is the ability to discover useful skills without external supervision.

The Wisdom of Hindsight Makes Language Models Better Instruction Followers

1 code implementation10 Feb 2023 Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E. Gonzalez

In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner.

Decision Making Language Modeling +4

Chain of Hindsight Aligns Language Models with Feedback

3 code implementations6 Feb 2023 Hao liu, Carmelo Sferrazza, Pieter Abbeel

Applying our method to large language models, we observed that Chain of Hindsight significantly surpasses previous methods in aligning language models with human preferences.

Multi-View Masked World Models for Visual Robotic Manipulation

1 code implementation5 Feb 2023 Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, Pieter Abbeel

In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation.

Camera Calibration Representation Learning

Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment

1 code implementation NeurIPS 2023 Hao liu, Wilson Yan, Pieter Abbeel

Recent progress in scaling up large language models has shown impressive capabilities in performing few-shot learning across a wide range of text-based tasks.

Attribute Few-Shot Image Classification +5

Masked Autoencoding for Scalable and Generalizable Decision Making

1 code implementation23 Nov 2022 Fangchen Liu, Hao liu, Aditya Grover, Pieter Abbeel

We are interested in learning scalable agents for reinforcement learning that can learn from large-scale, diverse sequential data similar to current large vision and language models.

Decision Making Offline RL +3

Multi-Environment Pretraining Enables Transfer to Action Limited Datasets

no code implementations23 Nov 2022 David Venuto, Sherry Yang, Pieter Abbeel, Doina Precup, Igor Mordatch, Ofir Nachum

Using massive datasets to train large-scale models has emerged as a dominant approach for broad generalization in natural language and vision applications.

Decision Making Sequential Decision Making

VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models

no code implementations CVPR 2023 Ajay Jain, Amber Xie, Pieter Abbeel

We show that a text-conditioned diffusion model trained on pixel representations of images can be used to generate SVG-exportable vector graphics.

Image Generation Text to 3D +1

StereoPose: Category-Level 6D Transparent Object Pose Estimation from Stereo Images via Back-View NOCS

no code implementations3 Nov 2022 Kai Chen, Stephen James, Congying Sui, Yun-hui Liu, Pieter Abbeel, Qi Dou

To further improve the performance of the stereo framework, StereoPose is equipped with a parallax attention module for stereo feature fusion and an epipolar loss for improving the stereo-view consistency of network predictions.

Object Pose Estimation +1

Sim-to-Real via Sim-to-Seg: End-to-end Off-road Autonomous Driving Without Real Data

1 code implementation25 Oct 2022 John So, Amber Xie, Sunggoo Jung, Jeffrey Edlund, Rohan Thakker, Ali Agha-mohammadi, Pieter Abbeel, Stephen James

In this paper, we address this challenge by presenting Sim2Seg, a re-imagining of RCAN that crosses the visual reality gap for off-road autonomous driving, without using any real-world data.

Autonomous Driving Reinforcement Learning (RL) +2

Instruction-Following Agents with Multimodal Transformer

1 code implementation24 Oct 2022 Hao liu, Lisa Lee, Kimin Lee, Pieter Abbeel

Our \ours method consists of a multimodal transformer that encodes visual observations and language instructions, and a transformer-based policy that predicts actions based on encoded representations.

Instruction Following Visual Grounding

Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models

no code implementations24 Oct 2022 Hao liu, Xinyang Geng, Lisa Lee, Igor Mordatch, Sergey Levine, Sharan Narang, Pieter Abbeel

Large language models (LLM) trained using the next-token-prediction objective, such as GPT3 and PaLM, have revolutionized natural language processing in recent years by showing impressive zero-shot and few-shot capabilities across a wide range of tasks.

Language Modeling Language Modelling +2

Dichotomy of Control: Separating What You Can Control from What You Cannot

1 code implementation24 Oct 2022 Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

While return-conditioning is at the heart of popular algorithms such as decision transformer (DT), these methods tend to perform poorly in highly stochastic environments, where an occasional high return can arise from randomness in the environment rather than the actions themselves.

Reinforcement Learning (RL)

Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions

no code implementations23 Oct 2022 Weirui Ye, Pieter Abbeel, Yang Gao

This paper proposes the Virtual MCTS (V-MCTS), a variant of MCTS that spends more search time on harder states and less search time on simpler states adaptively.

Atari Games Board Games

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

1 code implementation19 Oct 2022 Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica

Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks.

Reinforcement Learning (RL) Representation Learning +1

Skill-Based Reinforcement Learning with Intrinsic Reward Matching

1 code implementation14 Oct 2022 Ademi Adeniji, Amber Xie, Pieter Abbeel

While unsupervised skill discovery has shown promise in autonomously acquiring behavioral primitives, there is still a large methodological disconnect between task-agnostic skill pretraining and downstream, task-aware finetuning.

reinforcement-learning Reinforcement Learning +3

Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction

no code implementations13 Oct 2022 Yuxuan Liu, Nikhil Mishra, Maximilian Sieb, Yide Shentu, Pieter Abbeel, Xi Chen

3D bounding boxes are a widespread intermediate representation in many computer vision applications.

Prediction

Real-World Robot Learning with Masked Visual Pre-training

1 code implementation6 Oct 2022 Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell

Finally, we train a 307M parameter vision transformer on a massive collection of 4. 5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.

Temporally Consistent Transformers for Video Generation

2 code implementations5 Oct 2022 Wilson Yan, Danijar Hafner, Stephen James, Pieter Abbeel

To generate accurate videos, algorithms have to understand the spatial and temporal dependencies in the world.

Minecraft Video Generation +1

Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks

1 code implementation16 Sep 2022 Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68% on average.

Multi-Objective Policy Gradients with Topological Constraints

no code implementations15 Sep 2022 Kyle Hollins Wray, Stas Tiomkin, Mykel J. Kochenderfer, Pieter Abbeel

Multi-objective optimization models that encode ordered sequential constraints provide a solution to model various challenging problems including encoding preferences, modeling a curriculum, and enforcing measures of safety.

Deep Reinforcement Learning

HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator

no code implementations15 Sep 2022 Younggyo Seo, Kimin Lee, Fangchen Liu, Stephen James, Pieter Abbeel

Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics.

Data Augmentation Prediction +2

AdaCat: Adaptive Categorical Discretization for Autoregressive Models

1 code implementation3 Aug 2022 Qiyang Li, Ajay Jain, Pieter Abbeel

Autoregressive generative models can estimate complex continuous data distributions, like trajectory rollouts in an RL environment, image intensities, and audio.

Density Estimation Offline RL

Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision

1 code implementation29 Jun 2022 Ryan Hoque, Lawrence Yunliang Chen, Satvik Sharma, Karthik Dharmarajan, Brijen Thananjeyan, Pieter Abbeel, Ken Goldberg

With continual learning, interventions from the remote pool of humans can also be used to improve the robot fleet control policy over time.

Continual Learning

Masked World Models for Visual Control

no code implementations28 Jun 2022 Younggyo Seo, Danijar Hafner, Hao liu, Fangchen Liu, Stephen James, Kimin Lee, Pieter Abbeel

Yet the current approaches typically train a single model end-to-end for learning both visual representations and dynamics, making it difficult to accurately model the interaction between robots and small objects.

Model-based Reinforcement Learning Reinforcement Learning (RL) +1

DayDreamer: World Models for Physical Robot Learning

1 code implementation28 Jun 2022 Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel

Learning a world model to predict the outcomes of potential actions enables planning in imagination, reducing the amount of trial and error needed in the real environment.

Deep Reinforcement Learning Navigate +2

Patch-based Object-centric Transformers for Efficient Video Generation

1 code implementation8 Jun 2022 Wilson Yan, Ryo Okumura, Stephen James, Pieter Abbeel

In this work, we present Patch-based Object-centric Video Transformer (POVT), a novel region-based video generation architecture that leverages object-centric information to efficiently model temporal dynamics in videos.

Object Video Editing +2

Deep Hierarchical Planning from Pixels

1 code implementation8 Jun 2022 Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel

Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.

Atari Games Hierarchical Reinforcement Learning

On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

no code implementations7 Jun 2022 Zhao Mandi, Pieter Abbeel, Stephen James

From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.

Meta-Learning Meta Reinforcement Learning +5

Multimodal Masked Autoencoders Learn Transferable Representations

3 code implementations27 May 2022 Xinyang Geng, Hao liu, Lisa Lee, Dale Schuurmans, Sergey Levine, Pieter Abbeel

We provide an empirical study of M3AE trained on a large-scale image-text dataset, and find that M3AE is able to learn generalizable representations that transfer well to downstream tasks.

Contrastive Learning

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

2 code implementations ICLR 2022 Xinran Liang, Katherine Shu, Kimin Lee, Pieter Abbeel

Our intuition is that disagreement in learned reward model reflects uncertainty in tailored human feedback and could be useful for exploration.

reinforcement-learning Reinforcement Learning +2

Chain of Thought Imitation with Procedure Cloning

1 code implementation22 May 2022 Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior.

Imitation Learning Robot Manipulation

An Empirical Investigation of Representation Learning for Imitation

2 code implementations16 May 2022 Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah

We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites.

image-classification Image Classification +2

Coarse-to-fine Q-attention with Tree Expansion

1 code implementation26 Apr 2022 Stephen James, Pieter Abbeel

Coarse-to-fine Q-attention enables sample-efficient robot manipulation by discretizing the translation space in a coarse-to-fine manner, where the resolution gradually increases at each layer in the hierarchy.

Robot Manipulation

Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin Picking

no code implementations14 Apr 2022 Kai Chen, Rui Cao, Stephen James, Yichuan Li, Yun-hui Liu, Pieter Abbeel, Qi Dou

To continuously improve the quality of pseudo labels, we iterate the above steps by taking the trained student model as a new teacher and re-label real data using the refined teacher model.

6D Pose Estimation using RGB Robotic Grasping

Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning

no code implementations7 Apr 2022 Carl Qi, Pieter Abbeel, Aditya Grover

The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal.

Imitation Learning reinforcement-learning +3

Coarse-to-Fine Q-attention with Learned Path Ranking

1 code implementation4 Apr 2022 Stephen James, Pieter Abbeel

We propose Learned Path Ranking (LPR), a method that accepts an end-effector goal pose, and learns to rank a set of goal-reaching paths generated from an array of path generating methods, including: path planning, Bezier curve sampling, and a learned policy.

Benchmarking

Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling and Design

1 code implementation29 Mar 2022 Kourosh Hakhamaneshi, Marcel Nassar, Mariano Phielipp, Pieter Abbeel, Vladimir Stojanović

We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties with up to 10x more sample efficiency compared to a randomly initialized model.

Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions

no code implementations28 Mar 2022 Alejandro Escontrela, Xue Bin Peng, Wenhao Yu, Tingnan Zhang, Atil Iscen, Ken Goldberg, Pieter Abbeel

We also demonstrate that an effective style reward can be learned from a few seconds of motion capture data gathered from a German Shepherd and leads to energy-efficient locomotion strategies with natural gait transitions.

Reinforcement Learning with Action-Free Pre-Training from Videos

2 code implementations25 Mar 2022 Younggyo Seo, Kimin Lee, Stephen James, Pieter Abbeel

Our framework consists of two phases: we pre-train an action-free latent video prediction model, and then utilize the pre-trained representations for efficiently learning action-conditional world models on unseen environments.

Prediction reinforcement-learning +4

Teachable Reinforcement Learning via Advice Distillation

1 code implementation NeurIPS 2021 Olivia Watkins, Trevor Darrell, Pieter Abbeel, Jacob Andreas, Abhishek Gupta

Training automated agents to complete complex tasks in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human expert, and learning from intermediate forms of supervision (like binary preferences) is time-consuming and extracts little information from each human intervention.

Imitation Learning reinforcement-learning +2

It Takes Four to Tango: Multiagent Selfplay for Automatic Curriculum Generation

no code implementations22 Feb 2022 Yuqing Du, Pieter Abbeel, Aditya Grover

Training such agents efficiently requires automatic generation of a goal curriculum.

Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning

1 code implementation8 Feb 2022 Stephen James, Pieter Abbeel

We propose a new policy parameterization for representing 3D rotations during reinforcement learning.

continuous-control Continuous Control +4

CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

1 code implementation1 Feb 2022 Michael Laskin, Hao liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel

We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery that maximizes the mutual information between state-transitions and latent skill vectors.

Contrastive Learning Diversity +3

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

1 code implementation29 Jan 2022 Julius Frost, Olivia Watkins, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko

In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time.

counterfactual Decision Making +3

Target Entropy Annealing for Discrete Soft Actor-Critic

no code implementations6 Dec 2021 Yaosheng Xu, Dailin Hu, Litian Liang, Stephen Mcaleer, Pieter Abbeel, Roy Fox

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings.

Atari Games Scheduling

Zero-Shot Text-Guided Object Generation with Dream Fields

4 code implementations CVPR 2022 Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole

Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision.

Neural Rendering Object

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

no code implementations NeurIPS 2021 Charles Packer, Pieter Abbeel, Joseph E. Gonzalez

Meta-reinforcement learning (meta-RL) has proven to be a successful framework for leveraging experience from prior tasks to rapidly learn new related tasks, however, current meta-RL approaches struggle to learn in sparse reward environments.

Meta Reinforcement Learning

Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning

no code implementations28 Nov 2021 Dailin Hu, Pieter Abbeel, Roy Fox

Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as Soft Q-Learning (SQL) and Soft Actor-Critic trade off reward and policy entropy, which has the potential to improve training stability and robustness.

Q-Learning reinforcement-learning +3

Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning

no code implementations4 Nov 2021 Wenlong Huang, Igor Mordatch, Pieter Abbeel, Deepak Pathak

We show that a single generalist policy can perform in-hand manipulation of over 100 geometrically-diverse real-world objects and generalize to new objects with unseen shape or size.

Multi-Task Learning Object +3

B-Pref: Benchmarking Preference-Based Reinforcement Learning

1 code implementation4 Nov 2021 Kimin Lee, Laura Smith, Anca Dragan, Pieter Abbeel

However, it is difficult to quantify the progress in preference-based RL due to the lack of a commonly adopted benchmark.

Benchmarking reinforcement-learning +2

Mastering Atari Games with Limited Data

3 code implementations NeurIPS 2021 Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao

Recently, there has been significant progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal.

Atari Games Atari Games 100k

URLB: Unsupervised Reinforcement Learning Benchmark

1 code implementation28 Oct 2021 Michael Laskin, Denis Yarats, Hao liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, Pieter Abbeel

Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks.

continuous-control Continuous Control +4

Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates

no code implementations28 Oct 2021 Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

Under the belief that $\beta$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $\beta$ by maintaining a collection of the model parameters that characterizes model uncertainty.

Q-Learning Scheduling +1

Towards More Generalizable One-shot Visual Imitation Learning

no code implementations26 Oct 2021 Zhao Mandi, Fangchen Liu, Kimin Lee, Pieter Abbeel

We then study the multi-task setting, where multi-task training is followed by (i) one-shot imitation on variations within the training tasks, (ii) one-shot imitation on new tasks, and (iii) fine-tuning on new tasks.

Contrastive Learning Imitation Learning +2

It Takes Four to Tango: Multiagent Self Play for Automatic Curriculum Generation

1 code implementation ICLR 2022 Yuqing Du, Pieter Abbeel, Aditya Grover

We are interested in training general-purpose reinforcement learning agents that can solve a wide variety of goals.

Improving Long-Horizon Imitation Through Language Prediction

no code implementations29 Sep 2021 Donald Joseph Hejna III, Pieter Abbeel, Lerrel Pinto

Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents.

Prediction

Semi-supervised Offline Reinforcement Learning with Pre-trained Decision Transformers

no code implementations29 Sep 2021 Catherine Cang, Kourosh Hakhamaneshi, Ryan Rudes, Igor Mordatch, Aravind Rajeswaran, Pieter Abbeel, Michael Laskin

In this paper, we investigate how we can leverage large reward-free (i. e. task-agnostic) offline datasets of prior interactions to pre-train agents that can then be fine-tuned using a small reward-annotated dataset.

D4RL Offline RL +3

Autoregressive Latent Video Prediction with High-Fidelity Image Generator

no code implementations29 Sep 2021 Younggyo Seo, Kimin Lee, Fangchen Liu, Stephen James, Pieter Abbeel

Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics.

Data Augmentation Prediction +2

Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback

no code implementations11 Aug 2021 Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, Pieter Abbeel, Michael Laskin

A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations.

Playful Interactions for Representation Learning

no code implementations19 Jul 2021 Sarah Young, Jyothish Pari, Pieter Abbeel, Lerrel Pinto

In this work, we propose to use playful interactions in a self-supervised manner to learn visual representations for downstream tasks.

Imitation Learning Representation Learning

Hierarchical Few-Shot Imitation with Skill Transition Models

1 code implementation ICML Workshop URL 2021 Kourosh Hakhamaneshi, Ruihan Zhao, Albert Zhan, Pieter Abbeel, Michael Laskin

To this end, we present Few-shot Imitation with Skill Transition Models (FIST), an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks given a few downstream demonstrations.

The MineRL BASALT Competition on Learning from Human Feedback

no code implementations5 Jul 2021 Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan

Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve.

Imitation Learning Minecraft

Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

1 code implementation1 Jul 2021 SeungHyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, Jinwoo Shin

Recent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets.

Offline RL reinforcement-learning +1

Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments

no code implementations18 Jun 2021 Abdus Salam Azad, Edward Kim, Qiancheng Wu, Kimin Lee, Ion Stoica, Pieter Abbeel, Sanjit A. Seshia

To showcase the benefits, we interfaced SCENIC to an existing RTS environment Google Research Football(GRF) simulator and introduced a benchmark consisting of 32 realistic scenarios, encoded in SCENIC, to train RL agents and testing their generalization capabilities.

reinforcement-learning Reinforcement Learning +1

Unsupervised Learning of Visual 3D Keypoints for Control

1 code implementation14 Jun 2021 Boyuan Chen, Pieter Abbeel, Deepak Pathak

Prior works show that structured latent space such as visual keypoints often outperforms unstructured representations for robotic control.

Data-Efficient Exploration with Self Play for Atari

no code implementations ICML Workshop URL 2021 Michael Laskin, Catherine Cang, Ryan Rudes, Pieter Abbeel

To alleviate the reliance on reward engineering it is important to develop RL algorithms capable of efficiently acquiring skills with no rewards extrinsic to the agent.

Efficient Exploration Reinforcement Learning (RL)

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training

2 code implementations9 Jun 2021 Kimin Lee, Laura Smith, Pieter Abbeel

We also show that our method is able to utilize real-time human feedback to effectively prevent reward exploitation and learn new behaviors that are difficult to specify with standard reward functions.

reinforcement-learning Reinforcement Learning (RL) +1

JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data

1 code implementation2 Jun 2021 Kourosh Hakhamaneshi, Pieter Abbeel, Vladimir Stojanovic, Aditya Grover

Such a decomposition can dynamically control the reliability of information derived from the online and offline data and the use of pretrained neural networks permits scalability to large offline datasets.

Bayesian Optimization Gaussian Processes

VideoGPT: Video Generation using VQ-VAE and Transformers

3 code implementations20 Apr 2021 Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas

We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.

Position Video Generation

Auto-Tuned Sim-to-Real Transfer

1 code implementation15 Apr 2021 Yuqing Du, Olivia Watkins, Trevor Darrell, Pieter Abbeel, Deepak Pathak

Policies trained in simulation often fail when transferred to the real world due to the `reality gap' where the simulator is unable to accurately capture the dynamics and visual properties of the real world.

Learning What To Do by Simulating the Past

1 code implementation ICLR 2021 David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan

Since reward functions are hard to specify, recent work has focused on learning policies from human feedback.

MuJoCo

GEM: Group Enhanced Model for Learning Dynamical Control Systems

no code implementations7 Apr 2021 Philippe Hansen-Estruch, Wenling Shang, Lerrel Pinto, Pieter Abbeel, Stas Tiomkin

In this work, we take advantage of these structures to build effective dynamical models that are amenable to sample-based learning.

continuous-control Continuous Control +1

AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control

4 code implementations5 Apr 2021 Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, Angjoo Kanazawa

Our system produces high-quality motions that are comparable to those achieved by state-of-the-art tracking-based techniques, while also being able to easily accommodate large datasets of unstructured motion clips.

Imitation Learning Reinforcement Learning (RL)

Mutual Information State Intrinsic Control

2 code implementations ICLR 2021 Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu

Reinforcement learning has been shown to be highly successful at many challenging tasks.

Pretrained Transformers as Universal Computation Engines

4 code implementations9 Mar 2021 Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning -- in particular, without finetuning of the self-attention and feedforward layers of the residual blocks.

Task-Agnostic Morphology Evolution

1 code implementation ICLR 2021 Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto

Deep reinforcement learning primarily focuses on learning behavior, usually overlooking the fact that an agent's function is largely determined by form.

Deep Reinforcement Learning

MSA Transformer

1 code implementation13 Feb 2021 Roshan Rao, Jason Liu, Robert Verkuil, Joshua Meier, John F. Canny, Pieter Abbeel, Tom Sercu, Alexander Rives

Unsupervised protein language models trained across millions of diverse sequences learn structure and function of proteins.

Language Modeling Masked Language Modeling +2

Bottleneck Transformers for Visual Recognition

13 code implementations CVPR 2021 Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani

Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84. 7% top-1 accuracy on the ImageNet benchmark while being up to 1. 64x faster in compute time than the popular EfficientNet models on TPU-v3 hardware.

image-classification Image Classification +4

Compute- and Memory-Efficient Reinforcement Learning with Latent Experience Replay

no code implementations1 Jan 2021 Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel

In this paper, we present Latent Vector Experience Replay (LeVER), a simple modification of existing off-policy RL methods, to address these computational and memory requirements without sacrificing the performance of RL agents.

Atari Games Deep Reinforcement Learning +3

Benefits of Assistance over Reward Learning

no code implementations1 Jan 2021 Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell

By merging reward learning and control, assistive agents can reason about the impact of control actions on reward learning, leading to several advantages over agents based on reward learning.

Weighted Bellman Backups for Improved Signal-to-Noise in Q-Updates

no code implementations1 Jan 2021 Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel

Furthermore, since our weighted Bellman backups rely on maintaining an ensemble, we investigate how weighted Bellman backups interact with other benefits previously derived from ensembles: (a) Bootstrap; (b) UCB Exploration.

Deep Reinforcement Learning Q-Learning +1

Discrete Predictive Representation for Long-horizon Planning

no code implementations1 Jan 2021 Thanard Kurutach, Julia Peng, Yang Gao, Stuart Russell, Pieter Abbeel

Discrete representations have been key in enabling robots to plan at more abstract levels and solve temporally-extended tasks more efficiently for decades.

Deep Reinforcement Learning Object +1

Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets

no code implementations1 Jan 2021 SeungHyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, Jinwoo Shin

As it turns out, fine-tuning offline RL agents is a non-trivial challenge, due to distribution shift – the agent encounters out-of-distribution samples during online interaction, which may cause bootstrapping error in Q-learning and instability during fine-tuning.

D4RL MuJoCo +5

VideoGen: Generative Modeling of Videos using VQ-VAE and Transformers

no code implementations1 Jan 2021 Yunzhi Zhang, Wilson Yan, Pieter Abbeel, Aravind Srinivas

We present VideoGen: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.

Position Video Generation

Unsupervised Active Pre-Training for Reinforcement Learning

no code implementations1 Jan 2021 Hao liu, Pieter Abbeel

On DMControl suite, APT beats all baselines in terms of asymptotic performance and data efficiency and dramatically improves performance on tasks that are extremely difficult for training from scratch.

Atari Games Contrastive Learning +4

Robust Imitation via Decision-Time Planning

no code implementations1 Jan 2021 Carl Qi, Pieter Abbeel, Aditya Grover

The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal.

Imitation Learning reinforcement-learning +3

R-LAtte: Attention Module for Visual Control via Reinforcement Learning

no code implementations1 Jan 2021 Mandi Zhao, Qiyang Li, Aravind Srinivas, Ignasi Clavera, Kimin Lee, Pieter Abbeel

Attention mechanisms are generic inductive biases that have played a critical role in improving the state-of-the-art in supervised learning, unsupervised pre-training and generative modeling for multiple domains including vision, language and speech.

reinforcement-learning Reinforcement Learning +2

Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation

no code implementations14 Dec 2020 Albert Zhan, Ruihan Zhao, Lerrel Pinto, Pieter Abbeel, Michael Laskin

We present Contrastive Pre-training and Data Augmentation for Efficient Robotic Learning (CoDER), a method that utilizes data augmentation and unsupervised learning to achieve sample-efficient training of real-robot arm policies from sparse rewards.

Data Augmentation reinforcement-learning +3

Reset-Free Lifelong Learning with Skill-Space Planning

1 code implementation ICLR 2021 Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL based on planning in an abstract space of higher-order skills.

Lifelong learning MuJoCo +1

Parallel Training of Deep Networks with Local Updates

1 code implementation7 Dec 2020 Michael Laskin, Luke Metz, Seth Nabarro, Mark Saroufim, Badreddine Noune, Carlo Luschi, Jascha Sohl-Dickstein, Pieter Abbeel

Deep learning models trained on large data sets have been widely successful in both vision and language domains.

LaND: Learning to Navigate from Disengagements

1 code implementation9 Oct 2020 Gregory Kahn, Pieter Abbeel, Sergey Levine

However, we believe that these disengagements not only show where the system fails, which is useful for troubleshooting, but also provide a direct learning signal by which the robot can learn to navigate.

Autonomous Navigation Imitation Learning +4

Decoupling Representation Learning from Reinforcement Learning

3 code implementations14 Sep 2020 Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin

In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning.

Data Augmentation Deep Reinforcement Learning +3

Visual Imitation Made Easy

no code implementations11 Aug 2020 Sarah Young, Dhiraj Gandhi, Shubham Tulsiani, Abhinav Gupta, Pieter Abbeel, Lerrel Pinto

We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.

Imitation Learning

Robust Reinforcement Learning using Adversarial Populations

1 code implementation4 Aug 2020 Eugene Vinitsky, Yuqing Du, Kanaad Parvate, Kathy Jang, Pieter Abbeel, Alexandre Bayen

Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness, failing catastrophically when the underlying system dynamics are perturbed.

Out-of-Distribution Generalization reinforcement-learning +2

Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning

no code implementations3 Aug 2020 Xingyu Lu, Kimin Lee, Pieter Abbeel, Stas Tiomkin

Despite the significant progress of deep reinforcement learning (RL) in solving sequential decision making problems, RL agents often overfit to training environments and struggle to adapt to new, unseen environments.

Decision Making Deep Reinforcement Learning +3

Hybrid Discriminative-Generative Training via Contrastive Learning

1 code implementation17 Jul 2020 Hao Liu, Pieter Abbeel

In this paper we show that through the perspective of hybrid discriminative-generative training of energy-based models we can make a direct connection between contrastive learning and supervised learning.

Contrastive Learning Out-of-Distribution Detection

Efficient Empowerment Estimation for Unsupervised Stabilization

no code implementations ICLR 2021 Ruihan Zhao, Kevin Lu, Pieter Abbeel, Stas Tiomkin

We demonstrate our solution for sample-based unsupervised stabilization on different dynamical control systems and show the advantages of our method by comparing it to the existing VLB approaches.

Variable Skipping for Autoregressive Range Density Estimation

1 code implementation ICML 2020 Eric Liang, Zongheng Yang, Ion Stoica, Pieter Abbeel, Yan Duan, Xi Chen

In this paper, we explore a technique, variable skipping, for accelerating range density estimation over deep autoregressive models.

Data Augmentation Density Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.