no code implementations • ICLR 2019 • Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca Dragan
Our goal is to infer reward functions from demonstrations.
1 code implementation • ICML 2020 • Michael Laskin, Pieter Abbeel, Aravind Srinivas
CURL extracts high level features from raw pixels using a contrastive learning objective and performs off-policy control on top of the extracted features.
no code implementations • ICML 2020 • Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak
To solve complex tasks, intelligent agents first need to explore their environments.
no code implementations • ICML 2020 • Adam Stooke, Joshua Achiam, Pieter Abbeel
This intuition leads to our introduction of PID control for the Lagrange multiplier in constrained RL, which we cast as a dynamical system.
no code implementations • ICML 2020 • Donald Hejna, Lerrel Pinto, Pieter Abbeel
Learning long-range behaviors on complex high-dimensional agents is a fundamental problem in robot learning.
no code implementations • 7 Mar 2023 • Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans
In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
1 code implementation • 2 Mar 2023 • Changyeon Kim, Jongjin Park, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee
In this paper, we present Preference Transformer, a neural architecture that models human preferences using transformers.
no code implementations • 23 Feb 2023 • Kimin Lee, Hao liu, MoonKyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Shixiang Shane Gu
Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.
no code implementations • 19 Feb 2023 • Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath
Such robustness in the proposed multi-task policy enables Cassie to succeed in completing a variety of challenging jump tasks in the real world, such as standing long jumps, jumping onto elevated platforms, and multi-axis jumps.
no code implementations • 13 Feb 2023 • Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, Jacob Andreas
Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function.
1 code implementation • 10 Feb 2023 • Seohong Park, Kimin Lee, Youngwoon Lee, Pieter Abbeel
One of the key capabilities of intelligent agents is the ability to discover useful skills without external supervision.
1 code implementation • 10 Feb 2023 • Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E. Gonzalez
In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner.
2 code implementations • 6 Feb 2023 • Hao liu, Carmelo Sferrazza, Pieter Abbeel
Applying our method to large language models, we observed that Chain of Hindsight significantly surpasses previous methods in aligning language models with human preferences.
1 code implementation • 5 Feb 2023 • Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, Pieter Abbeel
In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation.
1 code implementation • 2 Feb 2023 • Hao liu, Wilson Yan, Pieter Abbeel
Recent progress in scaling up large language models has shown impressive capabilities in performing few-shot learning across a wide range of text-based tasks.
no code implementations • 31 Jan 2023 • Yilun Du, Mengjiao Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Joshua B. Tenenbaum, Dale Schuurmans, Pieter Abbeel
The proposed policy-as-video formulation can further represent environments with different state and action spaces in a unified space of images, which, for example, enables learning and generalization across a variety of robot manipulation tasks.
1 code implementation • 23 Nov 2022 • Fangchen Liu, Hao liu, Aditya Grover, Pieter Abbeel
We are interested in learning scalable agents for reinforcement learning that can learn from large-scale, diverse sequential data similar to current large vision and language models.
no code implementations • 23 Nov 2022 • David Venuto, Sherry Yang, Pieter Abbeel, Doina Precup, Igor Mordatch, Ofir Nachum
Using massive datasets to train large-scale models has emerged as a dominant approach for broad generalization in natural language and vision applications.
no code implementations • 21 Nov 2022 • Ajay Jain, Amber Xie, Pieter Abbeel
We show that a text-conditioned diffusion model trained on pixel representations of images can be used to generate SVG-exportable vector graphics.
no code implementations • 3 Nov 2022 • Kai Chen, Stephen James, Congying Sui, Yun-hui Liu, Pieter Abbeel, Qi Dou
To further improve the performance of the stereo framework, StereoPose is equipped with a parallax attention module for stereo feature fusion and an epipolar loss for improving the stereo-view consistency of network predictions.
1 code implementation • 25 Oct 2022 • John So, Amber Xie, Sunggoo Jung, Jeffrey Edlund, Rohan Thakker, Ali Agha-mohammadi, Pieter Abbeel, Stephen James
In this paper, we address this challenge by presenting Sim2Seg, a re-imagining of RCAN that crosses the visual reality gap for off-road autonomous driving, without using any real-world data.
1 code implementation • 24 Oct 2022 • Hao liu, Lisa Lee, Kimin Lee, Pieter Abbeel
Our \ours method consists of a multimodal transformer that encodes visual observations and language instructions, and a transformer-based policy that predicts actions based on encoded representations.
1 code implementation • 24 Oct 2022 • Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum
While return-conditioning is at the heart of popular algorithms such as decision transformer (DT), these methods tend to perform poorly in highly stochastic environments, where an occasional high return can arise from randomness in the environment rather than the actions themselves.
no code implementations • 24 Oct 2022 • Hao liu, Xinyang Geng, Lisa Lee, Igor Mordatch, Sergey Levine, Sharan Narang, Pieter Abbeel
Large language models (LLM) trained using the next-token-prediction objective, such as GPT3 and PaLM, have revolutionized natural language processing in recent years by showing impressive zero-shot and few-shot capabilities across a wide range of tasks.
no code implementations • 23 Oct 2022 • Weirui Ye, Pieter Abbeel, Yang Gao
This paper proposes the Virtual MCTS (V-MCTS), a variant of MCTS that spends more search time on harder states and less search time on simpler states adaptively.
1 code implementation • 19 Oct 2022 • Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica
Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks.
1 code implementation • 14 Oct 2022 • Ademi Adeniji, Amber Xie, Pieter Abbeel
However, often the most concise yet complete description of a task is the reward function itself, and skill learning methods learn an $\textit{intrinsic}$ reward function via the discriminator that corresponds to the skill policy.
no code implementations • 13 Oct 2022 • Yuxuan Liu, Nikhil Mishra, Maximilian Sieb, Yide Shentu, Pieter Abbeel, Xi Chen
3D bounding boxes are a widespread intermediate representation in many computer vision applications.
1 code implementation • 6 Oct 2022 • Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell
Finally, we train a 307M parameter vision transformer on a massive collection of 4. 5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.
1 code implementation • 5 Oct 2022 • Wilson Yan, Danijar Hafner, Stephen James, Pieter Abbeel
In this work, we present Temporally Consistent Video Transformer (TECO), a vector-quantized latent dynamics video prediction model that learns compressed representations to efficiently condition on long videos of hundreds of frames during both training and generation.
1 code implementation • 16 Sep 2022 • Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox
On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68% on average.
no code implementations • 15 Sep 2022 • Kyle Hollins Wray, Stas Tiomkin, Mykel J. Kochenderfer, Pieter Abbeel
Multi-objective optimization models that encode ordered sequential constraints provide a solution to model various challenging problems including encoding preferences, modeling a curriculum, and enforcing measures of safety.
no code implementations • 15 Sep 2022 • Younggyo Seo, Kimin Lee, Fangchen Liu, Stephen James, Pieter Abbeel
Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics.
1 code implementation • 3 Aug 2022 • Qiyang Li, Ajay Jain, Pieter Abbeel
Autoregressive generative models can estimate complex continuous data distributions, like trajectory rollouts in an RL environment, image intensities, and audio.
1 code implementation • 29 Jun 2022 • Ryan Hoque, Lawrence Yunliang Chen, Satvik Sharma, Karthik Dharmarajan, Brijen Thananjeyan, Pieter Abbeel, Ken Goldberg
With continual learning, interventions from the remote pool of humans can also be used to improve the robot fleet control policy over time.
1 code implementation • 28 Jun 2022 • Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel
Learning a world model to predict the outcomes of potential actions enables planning in imagination, reducing the amount of trial and error needed in the real environment.
no code implementations • 28 Jun 2022 • Younggyo Seo, Danijar Hafner, Hao liu, Fangchen Liu, Stephen James, Kimin Lee, Pieter Abbeel
Yet the current approaches typically train a single model end-to-end for learning both visual representations and dynamics, making it difficult to accurately model the interaction between robots and small objects.
no code implementations • 8 Jun 2022 • Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel
Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.
1 code implementation • 8 Jun 2022 • Wilson Yan, Ryo Okumura, Stephen James, Pieter Abbeel
In this work, we present Patch-based Object-centric Video Transformer (POVT), a novel region-based video generation architecture that leverages object-centric information to efficiently model temporal dynamics in videos.
no code implementations • 7 Jun 2022 • Zhao Mandi, Pieter Abbeel, Stephen James
From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.
1 code implementation • 27 May 2022 • Xinyang Geng, Hao liu, Lisa Lee, Dale Schuurmans, Sergey Levine, Pieter Abbeel
We provide an empirical study of M3AE trained on a large-scale image-text dataset, and find that M3AE is able to learn generalizable representations that transfer well to downstream tasks.
no code implementations • ICLR 2022 • Xinran Liang, Katherine Shu, Kimin Lee, Pieter Abbeel
Our intuition is that disagreement in learned reward model reflects uncertainty in tailored human feedback and could be useful for exploration.
1 code implementation • 22 May 2022 • Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum
Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior.
2 code implementations • 16 May 2022 • Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah
We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites.
1 code implementation • 26 Apr 2022 • Stephen James, Pieter Abbeel
Coarse-to-fine Q-attention enables sample-efficient robot manipulation by discretizing the translation space in a coarse-to-fine manner, where the resolution gradually increases at each layer in the hierarchy.
no code implementations • 14 Apr 2022 • Kai Chen, Rui Cao, Stephen James, Yichuan Li, Yun-hui Liu, Pieter Abbeel, Qi Dou
To continuously improve the quality of pseudo labels, we iterate the above steps by taking the trained student model as a new teacher and re-label real data using the refined teacher model.
no code implementations • 7 Apr 2022 • Carl Qi, Pieter Abbeel, Aditya Grover
The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal.
1 code implementation • 4 Apr 2022 • Stephen James, Pieter Abbeel
We propose Learned Path Ranking (LPR), a method that accepts an end-effector goal pose, and learns to rank a set of goal-reaching paths generated from an array of path generating methods, including: path planning, Bezier curve sampling, and a learned policy.
1 code implementation • 29 Mar 2022 • Kourosh Hakhamaneshi, Marcel Nassar, Mariano Phielipp, Pieter Abbeel, Vladimir Stojanović
We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties with up to 10x more sample efficiency compared to a randomly initialized model.
no code implementations • 28 Mar 2022 • Alejandro Escontrela, Xue Bin Peng, Wenhao Yu, Tingnan Zhang, Atil Iscen, Ken Goldberg, Pieter Abbeel
We also demonstrate that an effective style reward can be learned from a few seconds of motion capture data gathered from a German Shepherd and leads to energy-efficient locomotion strategies with natural gait transitions.
1 code implementation • 25 Mar 2022 • Younggyo Seo, Kimin Lee, Stephen James, Pieter Abbeel
Our framework consists of two phases: we pre-train an action-free latent video prediction model, and then utilize the pre-trained representations for efficiently learning action-conditional world models on unseen environments.
1 code implementation • NeurIPS 2021 • Olivia Watkins, Trevor Darrell, Pieter Abbeel, Jacob Andreas, Abhishek Gupta
Training automated agents to complete complex tasks in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human expert, and learning from intermediate forms of supervision (like binary preferences) is time-consuming and extracts little information from each human intervention.
no code implementations • ICLR 2022 • Jongjin Park, Younggyo Seo, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
no code implementations • 22 Feb 2022 • Yuqing Du, Pieter Abbeel, Aditya Grover
Training such agents efficiently requires automatic generation of a goal curriculum.
1 code implementation • 8 Feb 2022 • Stephen James, Pieter Abbeel
We propose a new policy parameterization for representing 3D rotations during reinforcement learning.
1 code implementation • 1 Feb 2022 • Michael Laskin, Hao liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel
We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery that maximizes the mutual information between state-transitions and latent skill vectors.
1 code implementation • 31 Jan 2022 • Denis Yarats, David Brandfonbrener, Hao liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto
In this work, we propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
1 code implementation • 29 Jan 2022 • Julius Frost, Olivia Watkins, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko
In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time.
1 code implementation • 18 Jan 2022 • Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch
However, the plans produced naively by LLMs often cannot map precisely to admissible actions.
no code implementations • 6 Dec 2021 • Yaosheng Xu, Dailin Hu, Litian Liang, Stephen Mcaleer, Pieter Abbeel, Roy Fox
Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings.
4 code implementations • CVPR 2022 • Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole
Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision.
no code implementations • NeurIPS 2021 • Charles Packer, Pieter Abbeel, Joseph E. Gonzalez
Meta-reinforcement learning (meta-RL) has proven to be a successful framework for leveraging experience from prior tasks to rapidly learn new related tasks, however, current meta-RL approaches struggle to learn in sparse reward environments.
no code implementations • 28 Nov 2021 • Dailin Hu, Pieter Abbeel, Roy Fox
Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as Soft Q-Learning (SQL) and Soft Actor-Critic trade off reward and policy entropy, which has the potential to improve training stability and robustness.
1 code implementation • 4 Nov 2021 • Kimin Lee, Laura Smith, Anca Dragan, Pieter Abbeel
However, it is difficult to quantify the progress in preference-based RL due to the lack of a commonly adopted benchmark.
no code implementations • 4 Nov 2021 • Wenlong Huang, Igor Mordatch, Pieter Abbeel, Deepak Pathak
We show that a single generalist policy can perform in-hand manipulation of over 100 geometrically-diverse real-world objects and generalize to new objects with unseen shape or size.
2 code implementations • NeurIPS 2021 • Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao
Recently, there has been significant progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal.
Ranked #1 on
Atari Games 100k
on Atari 100k
no code implementations • 28 Oct 2021 • Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox
Under the belief that $\beta$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $\beta$ by maintaining a collection of the model parameters that characterizes model uncertainty.
1 code implementation • 28 Oct 2021 • Michael Laskin, Denis Yarats, Hao liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, Pieter Abbeel
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks.
no code implementations • 26 Oct 2021 • Zhao Mandi, Fangchen Liu, Kimin Lee, Pieter Abbeel
We then study the multi-task setting, where multi-task training is followed by (i) one-shot imitation on variations within the training tasks, (ii) one-shot imitation on new tasks, and (iii) fine-tuning on new tasks.
no code implementations • 29 Sep 2021 • Aaron L Putterman, Kevin Lu, Igor Mordatch, Pieter Abbeel
We study reinforcement learning (RL) agents which can utilize language inputs.
no code implementations • 29 Sep 2021 • Catherine Cang, Kourosh Hakhamaneshi, Ryan Rudes, Igor Mordatch, Aravind Rajeswaran, Pieter Abbeel, Michael Laskin
In this paper, we investigate how we can leverage large reward-free (i. e. task-agnostic) offline datasets of prior interactions to pre-train agents that can then be fine-tuned using a small reward-annotated dataset.
no code implementations • 29 Sep 2021 • Younggyo Seo, Kimin Lee, Fangchen Liu, Stephen James, Pieter Abbeel
Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics.
1 code implementation • ICLR 2022 • Yuqing Du, Pieter Abbeel, Aditya Grover
We are interested in training general-purpose reinforcement learning agents that can solve a wide variety of goals.
no code implementations • 29 Sep 2021 • Donald Joseph Hejna III, Pieter Abbeel, Lerrel Pinto
Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents.
no code implementations • 31 Aug 2021 • Hao liu, Pieter Abbeel
We introduce a new unsupervised pretraining objective for reinforcement learning.
no code implementations • 11 Aug 2021 • Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, Pieter Abbeel, Michael Laskin
A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations.
no code implementations • 19 Jul 2021 • Sarah Young, Jyothish Pari, Pieter Abbeel, Lerrel Pinto
In this work, we propose to use playful interactions in a self-supervised manner to learn visual representations for downstream tasks.
1 code implementation • ICML Workshop URL 2021 • Kourosh Hakhamaneshi, Ruihan Zhao, Albert Zhan, Pieter Abbeel, Michael Laskin
To this end, we present Few-shot Imitation with Skill Transition Models (FIST), an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks given a few downstream demonstrations.
no code implementations • 5 Jul 2021 • Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan
Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve.
1 code implementation • 1 Jul 2021 • SeungHyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, Jinwoo Shin
Recent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets.
no code implementations • 18 Jun 2021 • Abdus Salam Azad, Edward Kim, Qiancheng Wu, Kimin Lee, Ion Stoica, Pieter Abbeel, Sanjit A. Seshia
Furthermore, in complex domains such as soccer, the space of possible scenarios is infinite, which makes it impossible for one research group to provide a comprehensive set of scenarios to train, test, and benchmark RL algorithms.
no code implementations • 16 Jun 2021 • Catherine Cang, Aravind Rajeswaran, Pieter Abbeel, Michael Laskin
When combined together, they substantially improve the performance and generalization of offline RL policies.
1 code implementation • 14 Jun 2021 • Boyuan Chen, Pieter Abbeel, Deepak Pathak
Prior works show that structured latent space such as visual keypoints often outperforms unstructured representations for robotic control.
no code implementations • ICML Workshop URL 2021 • Michael Laskin, Catherine Cang, Ryan Rudes, Pieter Abbeel
To alleviate the reliance on reward engineering it is important to develop RL algorithms capable of efficiently acquiring skills with no rewards extrinsic to the agent.
2 code implementations • 9 Jun 2021 • Kimin Lee, Laura Smith, Pieter Abbeel
We also show that our method is able to utilize real-time human feedback to effectively prevent reward exploitation and learn new behaviors that are difficult to specify with standard reward functions.
1 code implementation • 2 Jun 2021 • Kourosh Hakhamaneshi, Pieter Abbeel, Vladimir Stojanovic, Aditya Grover
Such a decomposition can dynamically control the reliability of information derived from the online and offline data and the use of pretrained neural networks permits scalability to large offline datasets.
11 code implementations • NeurIPS 2021 • Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch
In particular, we present Decision Transformer, an architecture that casts the problem of RL as conditional sequence modeling.
Ranked #42 on
Atari Games
on Atari 2600 Pong
(using extra training data)
1 code implementation • 20 Apr 2021 • Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas
We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.
1 code implementation • 15 Apr 2021 • Yuqing Du, Olivia Watkins, Trevor Darrell, Pieter Abbeel, Deepak Pathak
Policies trained in simulation often fail when transferred to the real world due to the `reality gap' where the simulator is unable to accurately capture the dynamics and visual properties of the real world.
1 code implementation • ICLR 2021 • David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan
Since reward functions are hard to specify, recent work has focused on learning policies from human feedback.
no code implementations • 7 Apr 2021 • Philippe Hansen-Estruch, Wenling Shang, Lerrel Pinto, Pieter Abbeel, Stas Tiomkin
In this work, we take advantage of these structures to build effective dynamical models that are amenable to sample-based learning.
2 code implementations • 5 Apr 2021 • Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, Angjoo Kanazawa
Our system produces high-quality motions that are comparable to those achieved by state-of-the-art tracking-based techniques, while also being able to easily accommodate large datasets of unstructured motion clips.
2 code implementations • ICCV 2021 • Ajay Jain, Matthew Tancik, Pieter Abbeel
We present DietNeRF, a 3D neural scene representation estimated from a few images.
no code implementations • 26 Mar 2021 • Zhongyu Li, Xuxin Cheng, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath
Developing robust walking controllers for bipedal robots is a challenging endeavor.
2 code implementations • ICLR 2021 • Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu
Reinforcement learning has been shown to be highly successful at many challenging tasks.
3 code implementations • 9 Mar 2021 • Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch
We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning -- in particular, without finetuning of the self-attention and feedforward layers of the residual blocks.
1 code implementation • NeurIPS 2021 • Hao liu, Pieter Abbeel
We introduce a new unsupervised pre-training method for reinforcement learning called APT, which stands for Active Pre-Training.
1 code implementation • NeurIPS 2021 • Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel
Recent advances in off-policy deep reinforcement learning (RL) have led to impressive success in complex tasks from visual observations.
Ranked #32 on
Atari Games
on Atari 2600 Amidar
1 code implementation • ICLR 2021 • Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto
Deep reinforcement learning primarily focuses on learning behavior, usually overlooking the fact that an agent's function is largely determined by form.
2 code implementations • ICLR Workshop SSL-RL 2021 • Younggyo Seo, Lili Chen, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee
Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL).
1 code implementation • 13 Feb 2021 • Roshan Rao, Jason Liu, Robert Verkuil, Joshua Meier, John F. Canny, Pieter Abbeel, Tom Sercu, Alexander Rives
Unsupervised protein language models trained across millions of diverse sequences learn structure and function of proteins.
13 code implementations • CVPR 2021 • Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani
Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84. 7% top-1 accuracy on the ImageNet benchmark while being up to 1. 64x faster in compute time than the popular EfficientNet models on TPU-v3 hardware.
Ranked #37 on
Instance Segmentation
on COCO minival
2 code implementations • NeurIPS 2021 • Wenling Shang, Xiaofei Wang, Aravind Srinivas, Aravind Rajeswaran, Yang Gao, Pieter Abbeel, Michael Laskin
Temporal information is essential to learning effective policies with Reinforcement Learning (RL).
no code implementations • 1 Jan 2021 • Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell
By merging reward learning and control, assistive agents can reason about the impact of control actions on reward learning, leading to several advantages over agents based on reward learning.
no code implementations • 1 Jan 2021 • Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel
In this paper, we present Latent Vector Experience Replay (LeVER), a simple modification of existing off-policy RL methods, to address these computational and memory requirements without sacrificing the performance of RL agents.
no code implementations • 1 Jan 2021 • SeungHyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, Jinwoo Shin
As it turns out, fine-tuning offline RL agents is a non-trivial challenge, due to distribution shift – the agent encounters out-of-distribution samples during online interaction, which may cause bootstrapping error in Q-learning and instability during fine-tuning.
no code implementations • 1 Jan 2021 • Hao liu, Pieter Abbeel
On DMControl suite, APT beats all baselines in terms of asymptotic performance and data efficiency and dramatically improves performance on tasks that are extremely difficult for training from scratch.
no code implementations • 1 Jan 2021 • Carl Qi, Pieter Abbeel, Aditya Grover
The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal.
no code implementations • 1 Jan 2021 • Mandi Zhao, Qiyang Li, Aravind Srinivas, Ignasi Clavera, Kimin Lee, Pieter Abbeel
Attention mechanisms are generic inductive biases that have played a critical role in improving the state-of-the-art in supervised learning, unsupervised pre-training and generative modeling for multiple domains including vision, language and speech.
no code implementations • 1 Jan 2021 • Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel
Furthermore, since our weighted Bellman backups rely on maintaining an ensemble, we investigate how weighted Bellman backups interact with other benefits previously derived from ensembles: (a) Bootstrap; (b) UCB Exploration.
no code implementations • 1 Jan 2021 • Thanard Kurutach, Julia Peng, Yang Gao, Stuart Russell, Pieter Abbeel
Discrete representations have been key in enabling robots to plan at more abstract levels and solve temporally-extended tasks more efficiently for decades.
no code implementations • 1 Jan 2021 • Yunzhi Zhang, Wilson Yan, Pieter Abbeel, Aravind Srinivas
We present VideoGen: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.
no code implementations • 14 Dec 2020 • Albert Zhan, Ruihan Zhao, Lerrel Pinto, Pieter Abbeel, Michael Laskin
We present Contrastive Pre-training and Data Augmentation for Efficient Robotic Learning (CoDER), a method that utilizes data augmentation and unsupervised learning to achieve sample-efficient training of real-robot arm policies from sparse rewards.
no code implementations • 7 Dec 2020 • Michael Laskin, Luke Metz, Seth Nabarro, Mark Saroufim, Badreddine Noune, Carlo Luschi, Jascha Sohl-Dickstein, Pieter Abbeel
Deep learning models trained on large data sets have been widely successful in both vision and language domains.
1 code implementation • ICLR 2021 • Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch
We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL based on planning in an abstract space of higher-order skills.
1 code implementation • NeurIPS 2020 • Younggyo Seo, Kimin Lee, Ignasi Clavera, Thanard Kurutach, Jinwoo Shin, Pieter Abbeel
Model-based reinforcement learning (RL) has shown great potential in various control tasks in terms of both sample-efficiency and final performance.
1 code implementation • 9 Oct 2020 • Gregory Kahn, Pieter Abbeel, Sergey Levine
However, we believe that these disengagements not only show where the system fails, which is useful for troubleshooting, but also provide a direct learning signal by which the robot can learn to navigate.
3 code implementations • 14 Sep 2020 • Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin
In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning.
no code implementations • 11 Aug 2020 • Sarah Young, Dhiraj Gandhi, Shubham Tulsiani, Abhinav Gupta, Pieter Abbeel, Lerrel Pinto
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
1 code implementation • 4 Aug 2020 • Eugene Vinitsky, Yuqing Du, Kanaad Parvate, Kathy Jang, Pieter Abbeel, Alexandre Bayen
Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness, failing catastrophically when the underlying system dynamics are perturbed.
Out-of-Distribution Generalization
reinforcement-learning
+1
no code implementations • 3 Aug 2020 • Xingyu Lu, Kimin Lee, Pieter Abbeel, Stas Tiomkin
Despite the significant progress of deep reinforcement learning (RL) in solving sequential decision making problems, RL agents often overfit to training environments and struggle to adapt to new, unseen environments.
1 code implementation • 17 Jul 2020 • Hao Liu, Pieter Abbeel
In this paper we show that through the perspective of hybrid discriminative-generative training of energy-based models we can make a direct connection between contrastive learning and supervised learning.
no code implementations • ICLR 2021 • Ruihan Zhao, Kevin Lu, Pieter Abbeel, Stas Tiomkin
We demonstrate our solution for sample-based unsupervised stabilization on different dynamical control systems and show the advantages of our method by comparing it to the existing VLB approaches.
1 code implementation • ICML 2020 • Eric Liang, Zongheng Yang, Ion Stoica, Pieter Abbeel, Yan Duan, Xi Chen
In this paper, we explore a technique, variable skipping, for accelerating range density estimation over deep autoregressive models.
1 code implementation • 9 Jul 2020 • Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel
Off-policy deep reinforcement learning (RL) has been successful in a range of challenging domains.
1 code implementation • EMNLP 2021 • Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph E. Gonzalez, Ion Stoica
Recent work learns contextual representations of source code by reconstructing tokens from their context.
Ranked #1 on
Method name prediction
on CodeSearchNet
2 code implementations • ICLR 2021 • Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyà, Pieter Abbeel, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang
A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal.
no code implementations • 8 Jul 2020 • Adam Stooke, Joshua Achiam, Pieter Abbeel
Lagrangian methods are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior during agent training.
1 code implementation • NeurIPS 2020 • Yuqing Du, Stas Tiomkin, Emre Kiciman, Daniel Polani, Pieter Abbeel, Anca Dragan
One difficulty in using artificial agents for human-assistive applications lies in the challenge of accurately assisting with a person's goal(s).
1 code implementation • 22 Jun 2020 • Ajay Jain, Pieter Abbeel, Deepak Pathak
For tasks such as image completion, these models are unable to use much of the observed context.
Ranked #1 on
Image Generation
on MNIST
46 code implementations • NeurIPS 2020 • Jonathan Ho, Ajay Jain, Pieter Abbeel
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics.
Ranked #2 on
Image Generation
on LSUN Bedroom
1 code implementation • NeurIPS 2020 • Yunzhi Zhang, Pieter Abbeel, Lerrel Pinto
Our key insight is that if we can sample goals at the frontier of the set of goals that an agent is able to reach, it will provide a significantly stronger learning signal compared to randomly sampled goals.
no code implementations • 16 May 2020 • Yiming Ding, Ignasi Clavera, Pieter Abbeel
The later, while they present low sample complexity, they learn latent spaces that need to reconstruct every single detail of the scene.
no code implementations • ICLR 2020 • Ignasi Clavera, Violet Fu, Pieter Abbeel
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator to augment the data for policy optimization or value function learning.
3 code implementations • 12 May 2020 • Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak
Reinforcement learning allows solving complex tasks, however, the learning tends to be task-specific and the sample efficiency remains a challenge.
1 code implementation • 7 May 2020 • Ge Yang, Amy Zhang, Ari S. Morcos, Joelle Pineau, Pieter Abbeel, Roberto Calandra
In this paper we introduce plan2vec, an unsupervised representation learning approach that is inspired by reinforcement learning.
2 code implementations • NeurIPS 2020 • Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas
To this end, we present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms.
6 code implementations • 8 Apr 2020 • Aravind Srinivas, Michael Laskin, Pieter Abbeel
On the DeepMind Control Suite, CURL is the first image-based algorithm to nearly match the sample-efficiency of methods that use state-based features.
Ranked #1 on
Continuous Control
on Finger, spin (DMControl500k)
1 code implementation • NeurIPS 2020 • Scott Emmons, Ajay Jain, Michael Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak
To operate effectively in the real world, agents should be able to act from high-dimensional raw sensory input such as images and achieve diverse goals across long time-horizons.
1 code implementation • 11 Mar 2020 • Wilson Yan, Ashwin Vangipuram, Pieter Abbeel, Lerrel Pinto
Using visual model-based learning for deformable object manipulation is challenging due to difficulties in learning plannable visual representations along with complex dynamic models.
1 code implementation • 3 Mar 2020 • Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto
Learning long-range behaviors on complex high-dimensional agents is a fundamental problem in robot learning.
1 code implementation • ICML 2020 • Kara Liu, Thanard Kurutach, Christine Tung, Pieter Abbeel, Aviv Tamar
In visual planning (VP), an agent learns to plan goal-directed behavior from observations of a dynamical system obtained offline, e. g., images obtained from self-supervised robot interaction.
no code implementations • NeurIPS 2020 • Alexander C. Li, Lerrel Pinto, Pieter Abbeel
Compared to standard relabeling techniques, Generalized Hindsight provides a substantially more efficient reuse of samples, which we empirically demonstrate on a suite of multi-task navigation and manipulation tasks.
no code implementations • 17 Feb 2020 • Kourosh Hakhamaneshi, Keertana Settaluri, Pieter Abbeel, Vladimir Stojanovic
In this work we present a new method of black-box optimization and constraint satisfaction.
1 code implementation • 13 Feb 2020 • Gregory Kahn, Pieter Abbeel, Sergey Levine
Mobile robot navigation is typically regarded as a geometric problem, in which the robot's objective is to perceive the geometry of the environment in order to plan collision-free paths towards a desired goal.
no code implementations • 5 Feb 2020 • Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu
In reinforcement learning, an agent learns to reach a set of goals by means of an external reward signal.
no code implementations • 31 Jan 2020 • Albert Zhan, Stas Tiomkin, Pieter Abbeel
To our knowledge, this is the first work regarding the protection of policies in Reinforcement Learning.
1 code implementation • 29 Dec 2019 • Roy Fox, Richard Shin, William Paul, Yitian Zou, Dawn Song, Ken Goldberg, Pieter Abbeel, Ion Stoica
Autonomous agents can learn by imitating teacher demonstrations of the intended behavior.
no code implementations • 21 Dec 2019 • Xingyu Lu, Stas Tiomkin, Pieter Abbeel
While recent progress in deep reinforcement learning has enabled robots to learn complex behaviors, tasks with long horizons and sparse rewards remain an ongoing challenge.
no code implementations • 10 Dec 2019 • Laura Smith, Nikita Dhawan, Marvin Zhang, Pieter Abbeel, Sergey Levine
In this paper, we study how these challenges can be alleviated with an automated robotic learning framework, in which multi-stage tasks are defined simply by providing videos of a human demonstrator and then learned autonomously by the robot from raw image observations.
no code implementations • 4 Dec 2019 • Ruihan Zhao, Stas Tiomkin, Pieter Abbeel
The core idea is to represent the relation between action sequences and future states using a stochastic dynamic model in latent space with a specific form.
1 code implementation • 3 Dec 2019 • Kevin Lu, Igor Mordatch, Pieter Abbeel
We study learning control in an online reset-free lifelong learning scenario, where mistakes can compound catastrophically into the future and the underlying dynamics of the environment may change.
1 code implementation • NeurIPS 2019 • Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine
We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training.
no code implementations • 25 Nov 2019 • Wilson Yan, Jonathan Ho, Pieter Abbeel
Deep autoregressive models are one of the most powerful models that exist today which achieve state-of-the-art bits per dim.
no code implementations • 30 Oct 2019 • Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine
We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training.
2 code implementations • 29 Oct 2019 • Yilin Wu, Wilson Yan, Thanard Kurutach, Lerrel Pinto, Pieter Abbeel
Second, instead of jointly learning both the pick and the place locations, we only explicitly learn the placing policy conditioned on random pick points.
1 code implementation • NeurIPS 2019 • Josh Tobin, OpenAI Robotics, Pieter Abbeel
Understanding the 3-dimensional structure of the world is a core challenge in computer vision and robotics.
1 code implementation • 28 Oct 2019 • Yunzhi Zhang, Ignasi Clavera, Boren Tsai, Pieter Abbeel
In this work, we propose an asynchronous framework for model-based reinforcement learning methods that brings down the run time of these algorithms to be just the data collection time.
Model-based Reinforcement Learning
reinforcement-learning
+1
2 code implementations • NeurIPS 2019 • Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan
While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves.
2 code implementations • 7 Oct 2019 • Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Kurt Keutzer, Ion Stoica, Joseph E. Gonzalez
We formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies.
no code implementations • 25 Sep 2019 • Ruihan Zhao, Stas Tiomkin, Pieter Abbeel
In this work, we develop a novel approach for the estimation of empowerment in unknown arbitrary dynamics from visual stimulus only, without sampling for the estimation of MIAS.
no code implementations • 25 Sep 2019 • Aravind Srinivas, Pieter Abbeel
In this paper, we propose a neural architecture for self-supervised representation learning on raw images called the PatchFormer which learns to model spatial dependencies across patches in a raw image.
9 code implementations • 3 Sep 2019 • Adam Stooke, Pieter Abbeel
rlpyt is designed as a high-throughput code base for small- to medium-scale research in deep RL.
no code implementations • 5 Aug 2019 • Hari Prasanna Das, Pieter Abbeel, Costas J. Spanos
Deep generative modeling using flows has gained popularity owing to the tractable exact log-likelihood estimation with efficient training and synthesis process.
1 code implementation • 5 Aug 2019 • Yusuke Urakami, Alec Hodgkinson, Casey Carlin, Randall Leu, Luca Rigazio, Pieter Abbeel
We introduce DoorGym, an open-source door opening simulation framework designed to utilize domain randomization to train a stable policy.
no code implementations • 23 Jul 2019 • Kourosh Hakhamaneshi, Nick Werblun, Pieter Abbeel, Vladimir Stojanovic
The discrepancy between post-layout and schematic simulation results continues to widen in analog design due in part to the domination of layout parasitics.
2 code implementations • 3 Jul 2019 • Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba
Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL.
7 code implementations • NeurIPS 2020 • Alex X. Lee, Anusha Nagabandi, Pieter Abbeel, Sergey Levine
Deep reinforcement learning (RL) algorithms can use high-capacity deep networks to learn directly from image observations.