Search Results for author: Pieter Abbeel

Found 305 papers, 166 papers with code

CURL: Contrastive Unsupervised Representation Learning for Reinforcement Learning

1 code implementation ICML 2020 Michael Laskin, Pieter Abbeel, Aravind Srinivas

CURL extracts high level features from raw pixels using a contrastive learning objective and performs off-policy control on top of the extracted features.

Contrastive Learning reinforcement-learning +2

Responsive Safety in Reinforcement Learning

no code implementations ICML 2020 Adam Stooke, Joshua Achiam, Pieter Abbeel

This intuition leads to our introduction of PID control for the Lagrange multiplier in constrained RL, which we cast as a dynamical system.

reinforcement-learning reinforcement Learning +1

Hierarchically Decoupled Morphological Transfer

no code implementations ICML 2020 Donald Hejna, Lerrel Pinto, Pieter Abbeel

Learning long-range behaviors on complex high-dimensional agents is a fundamental problem in robot learning.

Foundation Models for Decision Making: Problems, Methods, and Opportunities

no code implementations7 Mar 2023 Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans

In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.

Autonomous Driving Decision Making +1

Preference Transformer: Modeling Human Preferences using Transformers for RL

1 code implementation2 Mar 2023 Changyeon Kim, Jongjin Park, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee

In this paper, we present Preference Transformer, a neural architecture that models human preferences using transformers.

Decision Making

Aligning Text-to-Image Models using Human Feedback

no code implementations23 Feb 2023 Kimin Lee, Hao liu, MoonKyung Ryu, Olivia Watkins, Yuqing Du, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Shixiang Shane Gu

Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.

Image Generation

Robust and Versatile Bipedal Jumping Control through Multi-Task Reinforcement Learning

no code implementations19 Feb 2023 Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath

Such robustness in the proposed multi-task policy enables Cassie to succeed in completing a variety of challenging jump tasks in the real world, such as standing long jumps, jumping onto elevated platforms, and multi-axis jumps.

reinforcement-learning reinforcement Learning

Controllability-Aware Unsupervised Skill Discovery

1 code implementation10 Feb 2023 Seohong Park, Kimin Lee, Youngwoon Lee, Pieter Abbeel

One of the key capabilities of intelligent agents is the ability to discover useful skills without external supervision.

The Wisdom of Hindsight Makes Language Models Better Instruction Followers

1 code implementation10 Feb 2023 Tianjun Zhang, Fangchen Liu, Justin Wong, Pieter Abbeel, Joseph E. Gonzalez

In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner.

Decision Making Language Modelling +2

Chain of Hindsight Aligns Language Models with Feedback

2 code implementations6 Feb 2023 Hao liu, Carmelo Sferrazza, Pieter Abbeel

Applying our method to large language models, we observed that Chain of Hindsight significantly surpasses previous methods in aligning language models with human preferences.

Multi-View Masked World Models for Visual Robotic Manipulation

1 code implementation5 Feb 2023 Younggyo Seo, Junsu Kim, Stephen James, Kimin Lee, Jinwoo Shin, Pieter Abbeel

In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation.

Camera Calibration Representation Learning

Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment

1 code implementation2 Feb 2023 Hao liu, Wilson Yan, Pieter Abbeel

Recent progress in scaling up large language models has shown impressive capabilities in performing few-shot learning across a wide range of text-based tasks.

Few-Shot Image Classification Few-Shot Learning +2

Learning Universal Policies via Text-Guided Video Generation

no code implementations31 Jan 2023 Yilun Du, Mengjiao Yang, Bo Dai, Hanjun Dai, Ofir Nachum, Joshua B. Tenenbaum, Dale Schuurmans, Pieter Abbeel

The proposed policy-as-video formulation can further represent environments with different state and action spaces in a unified space of images, which, for example, enables learning and generalization across a variety of robot manipulation tasks.

Decision Making Image Generation +3

Masked Autoencoding for Scalable and Generalizable Decision Making

1 code implementation23 Nov 2022 Fangchen Liu, Hao liu, Aditya Grover, Pieter Abbeel

We are interested in learning scalable agents for reinforcement learning that can learn from large-scale, diverse sequential data similar to current large vision and language models.

Decision Making Offline RL +2

Multi-Environment Pretraining Enables Transfer to Action Limited Datasets

no code implementations23 Nov 2022 David Venuto, Sherry Yang, Pieter Abbeel, Doina Precup, Igor Mordatch, Ofir Nachum

Using massive datasets to train large-scale models has emerged as a dominant approach for broad generalization in natural language and vision applications.

Decision Making

VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models

no code implementations21 Nov 2022 Ajay Jain, Amber Xie, Pieter Abbeel

We show that a text-conditioned diffusion model trained on pixel representations of images can be used to generate SVG-exportable vector graphics.

Image Generation Text to 3D +1

StereoPose: Category-Level 6D Transparent Object Pose Estimation from Stereo Images via Back-View NOCS

no code implementations3 Nov 2022 Kai Chen, Stephen James, Congying Sui, Yun-hui Liu, Pieter Abbeel, Qi Dou

To further improve the performance of the stereo framework, StereoPose is equipped with a parallax attention module for stereo feature fusion and an epipolar loss for improving the stereo-view consistency of network predictions.

Pose Estimation Transparent objects

Sim-to-Real via Sim-to-Seg: End-to-end Off-road Autonomous Driving Without Real Data

1 code implementation25 Oct 2022 John So, Amber Xie, Sunggoo Jung, Jeffrey Edlund, Rohan Thakker, Ali Agha-mohammadi, Pieter Abbeel, Stephen James

In this paper, we address this challenge by presenting Sim2Seg, a re-imagining of RCAN that crosses the visual reality gap for off-road autonomous driving, without using any real-world data.

Autonomous Driving Robot Manipulation +1

InstructRL: Simple yet Effective Instruction-Following Agents with Multimodal Transformer

1 code implementation24 Oct 2022 Hao liu, Lisa Lee, Kimin Lee, Pieter Abbeel

Our \ours method consists of a multimodal transformer that encodes visual observations and language instructions, and a transformer-based policy that predicts actions based on encoded representations.

Instruction Following Visual Grounding

Dichotomy of Control: Separating What You Can Control from What You Cannot

1 code implementation24 Oct 2022 Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

While return-conditioning is at the heart of popular algorithms such as decision transformer (DT), these methods tend to perform poorly in highly stochastic environments, where an occasional high return can arise from randomness in the environment rather than the actions themselves.

Towards Better Few-Shot and Finetuning Performance with Forgetful Causal Language Models

no code implementations24 Oct 2022 Hao liu, Xinyang Geng, Lisa Lee, Igor Mordatch, Sergey Levine, Sharan Narang, Pieter Abbeel

Large language models (LLM) trained using the next-token-prediction objective, such as GPT3 and PaLM, have revolutionized natural language processing in recent years by showing impressive zero-shot and few-shot capabilities across a wide range of tasks.

Language Modelling Natural Language Inference +1

Spending Thinking Time Wisely: Accelerating MCTS with Virtual Expansions

no code implementations23 Oct 2022 Weirui Ye, Pieter Abbeel, Yang Gao

This paper proposes the Virtual MCTS (V-MCTS), a variant of MCTS that spends more search time on harder states and less search time on simpler states adaptively.

Atari Games Board Games

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

1 code implementation19 Oct 2022 Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica

Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks.

Representation Learning

Skill-Based Reinforcement Learning with Intrinsic Reward Matching

1 code implementation14 Oct 2022 Ademi Adeniji, Amber Xie, Pieter Abbeel

However, often the most concise yet complete description of a task is the reward function itself, and skill learning methods learn an $\textit{intrinsic}$ reward function via the discriminator that corresponds to the skill policy.

reinforcement-learning reinforcement Learning +1

Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction

no code implementations13 Oct 2022 Yuxuan Liu, Nikhil Mishra, Maximilian Sieb, Yide Shentu, Pieter Abbeel, Xi Chen

3D bounding boxes are a widespread intermediate representation in many computer vision applications.

Real-World Robot Learning with Masked Visual Pre-training

1 code implementation6 Oct 2022 Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell

Finally, we train a 307M parameter vision transformer on a massive collection of 4. 5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.

Temporally Consistent Video Transformer for Long-Term Video Prediction

1 code implementation5 Oct 2022 Wilson Yan, Danijar Hafner, Stephen James, Pieter Abbeel

In this work, we present Temporally Consistent Video Transformer (TECO), a vector-quantized latent dynamics video prediction model that learns compressed representations to efficiently condition on long videos of hundreds of frames during both training and generation.

Video Generation Video Prediction

Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks

1 code implementation16 Sep 2022 Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68% on average.

Multi-Objective Policy Gradients with Topological Constraints

no code implementations15 Sep 2022 Kyle Hollins Wray, Stas Tiomkin, Mykel J. Kochenderfer, Pieter Abbeel

Multi-objective optimization models that encode ordered sequential constraints provide a solution to model various challenging problems including encoding preferences, modeling a curriculum, and enforcing measures of safety.

HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator

no code implementations15 Sep 2022 Younggyo Seo, Kimin Lee, Fangchen Liu, Stephen James, Pieter Abbeel

Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics.

Data Augmentation Video Prediction

AdaCat: Adaptive Categorical Discretization for Autoregressive Models

1 code implementation3 Aug 2022 Qiyang Li, Ajay Jain, Pieter Abbeel

Autoregressive generative models can estimate complex continuous data distributions, like trajectory rollouts in an RL environment, image intensities, and audio.

Density Estimation Offline RL

Fleet-DAgger: Interactive Robot Fleet Learning with Scalable Human Supervision

1 code implementation29 Jun 2022 Ryan Hoque, Lawrence Yunliang Chen, Satvik Sharma, Karthik Dharmarajan, Brijen Thananjeyan, Pieter Abbeel, Ken Goldberg

With continual learning, interventions from the remote pool of humans can also be used to improve the robot fleet control policy over time.

Continual Learning

DayDreamer: World Models for Physical Robot Learning

1 code implementation28 Jun 2022 Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel

Learning a world model to predict the outcomes of potential actions enables planning in imagination, reducing the amount of trial and error needed in the real environment.

Navigate reinforcement-learning +1

Masked World Models for Visual Control

no code implementations28 Jun 2022 Younggyo Seo, Danijar Hafner, Hao liu, Fangchen Liu, Stephen James, Kimin Lee, Pieter Abbeel

Yet the current approaches typically train a single model end-to-end for learning both visual representations and dynamics, making it difficult to accurately model the interaction between robots and small objects.

Model-based Reinforcement Learning Representation Learning

Deep Hierarchical Planning from Pixels

no code implementations8 Jun 2022 Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel

Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.

Atari Games Hierarchical Reinforcement Learning

Patch-based Object-centric Transformers for Efficient Video Generation

1 code implementation8 Jun 2022 Wilson Yan, Ryo Okumura, Stephen James, Pieter Abbeel

In this work, we present Patch-based Object-centric Video Transformer (POVT), a novel region-based video generation architecture that leverages object-centric information to efficiently model temporal dynamics in videos.

Video Editing Video Generation +1

On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

no code implementations7 Jun 2022 Zhao Mandi, Pieter Abbeel, Stephen James

From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.

Meta-Learning Meta Reinforcement Learning +3

Multimodal Masked Autoencoders Learn Transferable Representations

1 code implementation27 May 2022 Xinyang Geng, Hao liu, Lisa Lee, Dale Schuurmans, Sergey Levine, Pieter Abbeel

We provide an empirical study of M3AE trained on a large-scale image-text dataset, and find that M3AE is able to learn generalizable representations that transfer well to downstream tasks.

Contrastive Learning

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

no code implementations ICLR 2022 Xinran Liang, Katherine Shu, Kimin Lee, Pieter Abbeel

Our intuition is that disagreement in learned reward model reflects uncertainty in tailored human feedback and could be useful for exploration.

reinforcement-learning reinforcement Learning +1

Chain of Thought Imitation with Procedure Cloning

1 code implementation22 May 2022 Mengjiao Yang, Dale Schuurmans, Pieter Abbeel, Ofir Nachum

Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior.

Imitation Learning Robot Manipulation

An Empirical Investigation of Representation Learning for Imitation

2 code implementations16 May 2022 Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah

We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites.

Image Classification Imitation Learning +1

Coarse-to-fine Q-attention with Tree Expansion

1 code implementation26 Apr 2022 Stephen James, Pieter Abbeel

Coarse-to-fine Q-attention enables sample-efficient robot manipulation by discretizing the translation space in a coarse-to-fine manner, where the resolution gradually increases at each layer in the hierarchy.

Robot Manipulation

Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin Picking

no code implementations14 Apr 2022 Kai Chen, Rui Cao, Stephen James, Yichuan Li, Yun-hui Liu, Pieter Abbeel, Qi Dou

To continuously improve the quality of pseudo labels, we iterate the above steps by taking the trained student model as a new teacher and re-label real data using the refined teacher model.

6D Pose Estimation using RGB Robotic Grasping

Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning

no code implementations7 Apr 2022 Carl Qi, Pieter Abbeel, Aditya Grover

The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal.

Imitation Learning reinforcement-learning +1

Coarse-to-Fine Q-attention with Learned Path Ranking

1 code implementation4 Apr 2022 Stephen James, Pieter Abbeel

We propose Learned Path Ranking (LPR), a method that accepts an end-effector goal pose, and learns to rank a set of goal-reaching paths generated from an array of path generating methods, including: path planning, Bezier curve sampling, and a learned policy.

Benchmarking

Pretraining Graph Neural Networks for few-shot Analog Circuit Modeling and Design

1 code implementation29 Mar 2022 Kourosh Hakhamaneshi, Marcel Nassar, Mariano Phielipp, Pieter Abbeel, Vladimir Stojanović

We show that pretraining GNNs on prediction of output node voltages can encourage learning representations that can be adapted to new unseen topologies or prediction of new circuit level properties with up to 10x more sample efficiency compared to a randomly initialized model.

Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions

no code implementations28 Mar 2022 Alejandro Escontrela, Xue Bin Peng, Wenhao Yu, Tingnan Zhang, Atil Iscen, Ken Goldberg, Pieter Abbeel

We also demonstrate that an effective style reward can be learned from a few seconds of motion capture data gathered from a German Shepherd and leads to energy-efficient locomotion strategies with natural gait transitions.

Reinforcement Learning with Action-Free Pre-Training from Videos

1 code implementation25 Mar 2022 Younggyo Seo, Kimin Lee, Stephen James, Pieter Abbeel

Our framework consists of two phases: we pre-train an action-free latent video prediction model, and then utilize the pre-trained representations for efficiently learning action-conditional world models on unseen environments.

reinforcement-learning reinforcement Learning +2

Teachable Reinforcement Learning via Advice Distillation

1 code implementation NeurIPS 2021 Olivia Watkins, Trevor Darrell, Pieter Abbeel, Jacob Andreas, Abhishek Gupta

Training automated agents to complete complex tasks in interactive environments is challenging: reinforcement learning requires careful hand-engineering of reward functions, imitation learning requires specialized infrastructure and access to a human expert, and learning from intermediate forms of supervision (like binary preferences) is time-consuming and extracts little information from each human intervention.

Imitation Learning reinforcement-learning +1

SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning

no code implementations ICLR 2022 Jongjin Park, Younggyo Seo, Jinwoo Shin, Honglak Lee, Pieter Abbeel, Kimin Lee

In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.

Data Augmentation

It Takes Four to Tango: Multiagent Selfplay for Automatic Curriculum Generation

no code implementations22 Feb 2022 Yuqing Du, Pieter Abbeel, Aditya Grover

Training such agents efficiently requires automatic generation of a goal curriculum.

Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning

1 code implementation8 Feb 2022 Stephen James, Pieter Abbeel

We propose a new policy parameterization for representing 3D rotations during reinforcement learning.

Continuous Control reinforcement-learning +2

CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

1 code implementation1 Feb 2022 Michael Laskin, Hao liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel

We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery that maximizes the mutual information between state-transitions and latent skill vectors.

Contrastive Learning reinforcement-learning +2

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

1 code implementation29 Jan 2022 Julius Frost, Olivia Watkins, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko

In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time.

Decision Making reinforcement-learning +1

Target Entropy Annealing for Discrete Soft Actor-Critic

no code implementations6 Dec 2021 Yaosheng Xu, Dailin Hu, Litian Liang, Stephen Mcaleer, Pieter Abbeel, Roy Fox

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings.

Atari Games Scheduling

Zero-Shot Text-Guided Object Generation with Dream Fields

4 code implementations CVPR 2022 Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben Poole

Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision.

Neural Rendering

Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL

no code implementations NeurIPS 2021 Charles Packer, Pieter Abbeel, Joseph E. Gonzalez

Meta-reinforcement learning (meta-RL) has proven to be a successful framework for leveraging experience from prior tasks to rapidly learn new related tasks, however, current meta-RL approaches struggle to learn in sparse reward environments.

Meta Reinforcement Learning

Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning

no code implementations28 Nov 2021 Dailin Hu, Pieter Abbeel, Roy Fox

Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as Soft Q-Learning (SQL) and Soft Actor-Critic trade off reward and policy entropy, which has the potential to improve training stability and robustness.

Q-Learning reinforcement-learning +2

B-Pref: Benchmarking Preference-Based Reinforcement Learning

1 code implementation4 Nov 2021 Kimin Lee, Laura Smith, Anca Dragan, Pieter Abbeel

However, it is difficult to quantify the progress in preference-based RL due to the lack of a commonly adopted benchmark.

Benchmarking reinforcement-learning +1

Generalization in Dexterous Manipulation via Geometry-Aware Multi-Task Learning

no code implementations4 Nov 2021 Wenlong Huang, Igor Mordatch, Pieter Abbeel, Deepak Pathak

We show that a single generalist policy can perform in-hand manipulation of over 100 geometrically-diverse real-world objects and generalize to new objects with unseen shape or size.

Multi-Task Learning reinforcement-learning +1

Mastering Atari Games with Limited Data

2 code implementations NeurIPS 2021 Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao

Recently, there has been significant progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal.

Atari Games Atari Games 100k

Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates

no code implementations28 Oct 2021 Litian Liang, Yaosheng Xu, Stephen Mcaleer, Dailin Hu, Alexander Ihler, Pieter Abbeel, Roy Fox

Under the belief that $\beta$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $\beta$ by maintaining a collection of the model parameters that characterizes model uncertainty.

Q-Learning Scheduling

URLB: Unsupervised Reinforcement Learning Benchmark

1 code implementation28 Oct 2021 Michael Laskin, Denis Yarats, Hao liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, Pieter Abbeel

Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks.

Continuous Control reinforcement-learning +2

Towards More Generalizable One-shot Visual Imitation Learning

no code implementations26 Oct 2021 Zhao Mandi, Fangchen Liu, Kimin Lee, Pieter Abbeel

We then study the multi-task setting, where multi-task training is followed by (i) one-shot imitation on variations within the training tasks, (ii) one-shot imitation on new tasks, and (iii) fine-tuning on new tasks.

Contrastive Learning Imitation Learning +2

Semi-supervised Offline Reinforcement Learning with Pre-trained Decision Transformers

no code implementations29 Sep 2021 Catherine Cang, Kourosh Hakhamaneshi, Ryan Rudes, Igor Mordatch, Aravind Rajeswaran, Pieter Abbeel, Michael Laskin

In this paper, we investigate how we can leverage large reward-free (i. e. task-agnostic) offline datasets of prior interactions to pre-train agents that can then be fine-tuned using a small reward-annotated dataset.

D4RL Offline RL +2

Autoregressive Latent Video Prediction with High-Fidelity Image Generator

no code implementations29 Sep 2021 Younggyo Seo, Kimin Lee, Fangchen Liu, Stephen James, Pieter Abbeel

Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics.

Data Augmentation Video Prediction

It Takes Four to Tango: Multiagent Self Play for Automatic Curriculum Generation

1 code implementation ICLR 2022 Yuqing Du, Pieter Abbeel, Aditya Grover

We are interested in training general-purpose reinforcement learning agents that can solve a wide variety of goals.

Improving Long-Horizon Imitation Through Language Prediction

no code implementations29 Sep 2021 Donald Joseph Hejna III, Pieter Abbeel, Lerrel Pinto

Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents.

Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback

no code implementations11 Aug 2021 Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, Pieter Abbeel, Michael Laskin

A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations.

Playful Interactions for Representation Learning

no code implementations19 Jul 2021 Sarah Young, Jyothish Pari, Pieter Abbeel, Lerrel Pinto

In this work, we propose to use playful interactions in a self-supervised manner to learn visual representations for downstream tasks.

Imitation Learning Representation Learning

Hierarchical Few-Shot Imitation with Skill Transition Models

1 code implementation ICML Workshop URL 2021 Kourosh Hakhamaneshi, Ruihan Zhao, Albert Zhan, Pieter Abbeel, Michael Laskin

To this end, we present Few-shot Imitation with Skill Transition Models (FIST), an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks given a few downstream demonstrations.

The MineRL BASALT Competition on Learning from Human Feedback

no code implementations5 Jul 2021 Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan

Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve.

Imitation Learning

Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

1 code implementation1 Jul 2021 SeungHyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, Jinwoo Shin

Recent advance in deep offline reinforcement learning (RL) has made it possible to train strong robotic agents from offline datasets.

Offline RL reinforcement-learning +1

Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments

no code implementations18 Jun 2021 Abdus Salam Azad, Edward Kim, Qiancheng Wu, Kimin Lee, Ion Stoica, Pieter Abbeel, Sanjit A. Seshia

Furthermore, in complex domains such as soccer, the space of possible scenarios is infinite, which makes it impossible for one research group to provide a comprehensive set of scenarios to train, test, and benchmark RL algorithms.

reinforcement-learning reinforcement Learning

Unsupervised Learning of Visual 3D Keypoints for Control

1 code implementation14 Jun 2021 Boyuan Chen, Pieter Abbeel, Deepak Pathak

Prior works show that structured latent space such as visual keypoints often outperforms unstructured representations for robotic control.

Data-Efficient Exploration with Self Play for Atari

no code implementations ICML Workshop URL 2021 Michael Laskin, Catherine Cang, Ryan Rudes, Pieter Abbeel

To alleviate the reliance on reward engineering it is important to develop RL algorithms capable of efficiently acquiring skills with no rewards extrinsic to the agent.

Efficient Exploration

PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training

2 code implementations9 Jun 2021 Kimin Lee, Laura Smith, Pieter Abbeel

We also show that our method is able to utilize real-time human feedback to effectively prevent reward exploitation and learn new behaviors that are difficult to specify with standard reward functions.

reinforcement-learning reinforcement Learning +1

JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data

1 code implementation2 Jun 2021 Kourosh Hakhamaneshi, Pieter Abbeel, Vladimir Stojanovic, Aditya Grover

Such a decomposition can dynamically control the reliability of information derived from the online and offline data and the use of pretrained neural networks permits scalability to large offline datasets.

Gaussian Processes

VideoGPT: Video Generation using VQ-VAE and Transformers

1 code implementation20 Apr 2021 Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas

We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.

Video Generation

Auto-Tuned Sim-to-Real Transfer

1 code implementation15 Apr 2021 Yuqing Du, Olivia Watkins, Trevor Darrell, Pieter Abbeel, Deepak Pathak

Policies trained in simulation often fail when transferred to the real world due to the `reality gap' where the simulator is unable to accurately capture the dynamics and visual properties of the real world.

Learning What To Do by Simulating the Past

1 code implementation ICLR 2021 David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan

Since reward functions are hard to specify, recent work has focused on learning policies from human feedback.

GEM: Group Enhanced Model for Learning Dynamical Control Systems

no code implementations7 Apr 2021 Philippe Hansen-Estruch, Wenling Shang, Lerrel Pinto, Pieter Abbeel, Stas Tiomkin

In this work, we take advantage of these structures to build effective dynamical models that are amenable to sample-based learning.

Continuous Control Model-based Reinforcement Learning

AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control

2 code implementations5 Apr 2021 Xue Bin Peng, Ze Ma, Pieter Abbeel, Sergey Levine, Angjoo Kanazawa

Our system produces high-quality motions that are comparable to those achieved by state-of-the-art tracking-based techniques, while also being able to easily accommodate large datasets of unstructured motion clips.

Imitation Learning

Mutual Information State Intrinsic Control

2 code implementations ICLR 2021 Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu

Reinforcement learning has been shown to be highly successful at many challenging tasks.

Pretrained Transformers as Universal Computation Engines

3 code implementations9 Mar 2021 Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning -- in particular, without finetuning of the self-attention and feedforward layers of the residual blocks.

Task-Agnostic Morphology Evolution

1 code implementation ICLR 2021 Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto

Deep reinforcement learning primarily focuses on learning behavior, usually overlooking the fact that an agent's function is largely determined by form.

MSA Transformer

1 code implementation13 Feb 2021 Roshan Rao, Jason Liu, Robert Verkuil, Joshua Meier, John F. Canny, Pieter Abbeel, Tom Sercu, Alexander Rives

Unsupervised protein language models trained across millions of diverse sequences learn structure and function of proteins.

Language Modelling Masked Language Modeling +1

Bottleneck Transformers for Visual Recognition

13 code implementations CVPR 2021 Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani

Finally, we present a simple adaptation of the BoTNet design for image classification, resulting in models that achieve a strong performance of 84. 7% top-1 accuracy on the ImageNet benchmark while being up to 1. 64x faster in compute time than the popular EfficientNet models on TPU-v3 hardware.

Image Classification Instance Segmentation +2

Benefits of Assistance over Reward Learning

no code implementations1 Jan 2021 Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell

By merging reward learning and control, assistive agents can reason about the impact of control actions on reward learning, leading to several advantages over agents based on reward learning.

Compute- and Memory-Efficient Reinforcement Learning with Latent Experience Replay

no code implementations1 Jan 2021 Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel

In this paper, we present Latent Vector Experience Replay (LeVER), a simple modification of existing off-policy RL methods, to address these computational and memory requirements without sacrificing the performance of RL agents.

Atari Games reinforcement-learning +2

Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets

no code implementations1 Jan 2021 SeungHyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, Jinwoo Shin

As it turns out, fine-tuning offline RL agents is a non-trivial challenge, due to distribution shift – the agent encounters out-of-distribution samples during online interaction, which may cause bootstrapping error in Q-learning and instability during fine-tuning.

D4RL Offline RL +3

Unsupervised Active Pre-Training for Reinforcement Learning

no code implementations1 Jan 2021 Hao liu, Pieter Abbeel

On DMControl suite, APT beats all baselines in terms of asymptotic performance and data efficiency and dramatically improves performance on tasks that are extremely difficult for training from scratch.

Atari Games Contrastive Learning +3

Robust Imitation via Decision-Time Planning

no code implementations1 Jan 2021 Carl Qi, Pieter Abbeel, Aditya Grover

The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal.

Imitation Learning reinforcement-learning +1

R-LAtte: Attention Module for Visual Control via Reinforcement Learning

no code implementations1 Jan 2021 Mandi Zhao, Qiyang Li, Aravind Srinivas, Ignasi Clavera, Kimin Lee, Pieter Abbeel

Attention mechanisms are generic inductive biases that have played a critical role in improving the state-of-the-art in supervised learning, unsupervised pre-training and generative modeling for multiple domains including vision, language and speech.

reinforcement-learning reinforcement Learning +1

Weighted Bellman Backups for Improved Signal-to-Noise in Q-Updates

no code implementations1 Jan 2021 Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel

Furthermore, since our weighted Bellman backups rely on maintaining an ensemble, we investigate how weighted Bellman backups interact with other benefits previously derived from ensembles: (a) Bootstrap; (b) UCB Exploration.

Q-Learning

Discrete Predictive Representation for Long-horizon Planning

no code implementations1 Jan 2021 Thanard Kurutach, Julia Peng, Yang Gao, Stuart Russell, Pieter Abbeel

Discrete representations have been key in enabling robots to plan at more abstract levels and solve temporally-extended tasks more efficiently for decades.

reinforcement Learning

VideoGen: Generative Modeling of Videos using VQ-VAE and Transformers

no code implementations1 Jan 2021 Yunzhi Zhang, Wilson Yan, Pieter Abbeel, Aravind Srinivas

We present VideoGen: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.

Video Generation

Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation

no code implementations14 Dec 2020 Albert Zhan, Ruihan Zhao, Lerrel Pinto, Pieter Abbeel, Michael Laskin

We present Contrastive Pre-training and Data Augmentation for Efficient Robotic Learning (CoDER), a method that utilizes data augmentation and unsupervised learning to achieve sample-efficient training of real-robot arm policies from sparse rewards.

Data Augmentation reinforcement-learning +2

Parallel Training of Deep Networks with Local Updates

no code implementations7 Dec 2020 Michael Laskin, Luke Metz, Seth Nabarro, Mark Saroufim, Badreddine Noune, Carlo Luschi, Jascha Sohl-Dickstein, Pieter Abbeel

Deep learning models trained on large data sets have been widely successful in both vision and language domains.

Reset-Free Lifelong Learning with Skill-Space Planning

1 code implementation ICLR 2021 Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL based on planning in an abstract space of higher-order skills.

LaND: Learning to Navigate from Disengagements

1 code implementation9 Oct 2020 Gregory Kahn, Pieter Abbeel, Sergey Levine

However, we believe that these disengagements not only show where the system fails, which is useful for troubleshooting, but also provide a direct learning signal by which the robot can learn to navigate.

Autonomous Navigation Imitation Learning +3

Decoupling Representation Learning from Reinforcement Learning

3 code implementations14 Sep 2020 Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin

In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning.

Data Augmentation reinforcement-learning +2

Visual Imitation Made Easy

no code implementations11 Aug 2020 Sarah Young, Dhiraj Gandhi, Shubham Tulsiani, Abhinav Gupta, Pieter Abbeel, Lerrel Pinto

We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.

Imitation Learning

Robust Reinforcement Learning using Adversarial Populations

1 code implementation4 Aug 2020 Eugene Vinitsky, Yuqing Du, Kanaad Parvate, Kathy Jang, Pieter Abbeel, Alexandre Bayen

Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness, failing catastrophically when the underlying system dynamics are perturbed.

Out-of-Distribution Generalization reinforcement-learning +1

Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning

no code implementations3 Aug 2020 Xingyu Lu, Kimin Lee, Pieter Abbeel, Stas Tiomkin

Despite the significant progress of deep reinforcement learning (RL) in solving sequential decision making problems, RL agents often overfit to training environments and struggle to adapt to new, unseen environments.

Decision Making reinforcement-learning +1

Hybrid Discriminative-Generative Training via Contrastive Learning

1 code implementation17 Jul 2020 Hao Liu, Pieter Abbeel

In this paper we show that through the perspective of hybrid discriminative-generative training of energy-based models we can make a direct connection between contrastive learning and supervised learning.

Contrastive Learning Out-of-Distribution Detection

Efficient Empowerment Estimation for Unsupervised Stabilization

no code implementations ICLR 2021 Ruihan Zhao, Kevin Lu, Pieter Abbeel, Stas Tiomkin

We demonstrate our solution for sample-based unsupervised stabilization on different dynamical control systems and show the advantages of our method by comparing it to the existing VLB approaches.

Variable Skipping for Autoregressive Range Density Estimation

1 code implementation ICML 2020 Eric Liang, Zongheng Yang, Ion Stoica, Pieter Abbeel, Yan Duan, Xi Chen

In this paper, we explore a technique, variable skipping, for accelerating range density estimation over deep autoregressive models.

Data Augmentation Density Estimation

Self-Supervised Policy Adaptation during Deployment

2 code implementations ICLR 2021 Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyà, Pieter Abbeel, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang

A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal.

Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

no code implementations8 Jul 2020 Adam Stooke, Joshua Achiam, Pieter Abbeel

Lagrangian methods are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior during agent training.

reinforcement-learning reinforcement Learning +1

AvE: Assistance via Empowerment

1 code implementation NeurIPS 2020 Yuqing Du, Stas Tiomkin, Emre Kiciman, Daniel Polani, Pieter Abbeel, Anca Dragan

One difficulty in using artificial agents for human-assistive applications lies in the challenge of accurately assisting with a person's goal(s).

Locally Masked Convolution for Autoregressive Models

1 code implementation22 Jun 2020 Ajay Jain, Pieter Abbeel, Deepak Pathak

For tasks such as image completion, these models are unable to use much of the observed context.

Anomaly Detection Density Estimation +2

Denoising Diffusion Probabilistic Models

46 code implementations NeurIPS 2020 Jonathan Ho, Ajay Jain, Pieter Abbeel

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics.

Denoising Image Generation

Automatic Curriculum Learning through Value Disagreement

1 code implementation NeurIPS 2020 Yunzhi Zhang, Pieter Abbeel, Lerrel Pinto

Our key insight is that if we can sample goals at the frontier of the set of goals that an agent is able to reach, it will provide a significantly stronger learning signal compared to randomly sampled goals.

Mutual Information Maximization for Robust Plannable Representations

no code implementations16 May 2020 Yiming Ding, Ignasi Clavera, Pieter Abbeel

The later, while they present low sample complexity, they learn latent spaces that need to reconstruct every single detail of the scene.

Model-based Reinforcement Learning

Model-Augmented Actor-Critic: Backpropagating through Paths

no code implementations ICLR 2020 Ignasi Clavera, Violet Fu, Pieter Abbeel

Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator to augment the data for policy optimization or value function learning.

Model-based Reinforcement Learning

Planning to Explore via Self-Supervised World Models

3 code implementations12 May 2020 Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak

Reinforcement learning allows solving complex tasks, however, the learning tends to be task-specific and the sample efficiency remains a challenge.

reinforcement-learning reinforcement Learning

Plan2Vec: Unsupervised Representation Learning by Latent Plans

1 code implementation7 May 2020 Ge Yang, Amy Zhang, Ari S. Morcos, Joelle Pineau, Pieter Abbeel, Roberto Calandra

In this paper we introduce plan2vec, an unsupervised representation learning approach that is inspired by reinforcement learning.

Motion Planning reinforcement-learning +2

Reinforcement Learning with Augmented Data

2 code implementations NeurIPS 2020 Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas

To this end, we present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms.

Data Augmentation OpenAI Gym +2

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

6 code implementations8 Apr 2020 Aravind Srinivas, Michael Laskin, Pieter Abbeel

On the DeepMind Control Suite, CURL is the first image-based algorithm to nearly match the sample-efficiency of methods that use state-based features.

Atari Games Atari Games 100k +4

Sparse Graphical Memory for Robust Planning

1 code implementation NeurIPS 2020 Scott Emmons, Ajay Jain, Michael Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak

To operate effectively in the real world, agents should be able to act from high-dimensional raw sensory input such as images and achieve diverse goals across long time-horizons.

Imitation Learning Visual Navigation

Learning Predictive Representations for Deformable Objects Using Contrastive Estimation

1 code implementation11 Mar 2020 Wilson Yan, Ashwin Vangipuram, Pieter Abbeel, Lerrel Pinto

Using visual model-based learning for deformable object manipulation is challenging due to difficulties in learning plannable visual representations along with complex dynamic models.

Deformable Object Manipulation

Hierarchically Decoupled Imitation for Morphological Transfer

1 code implementation3 Mar 2020 Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto

Learning long-range behaviors on complex high-dimensional agents is a fundamental problem in robot learning.

Hallucinative Topological Memory for Zero-Shot Visual Planning

1 code implementation ICML 2020 Kara Liu, Thanard Kurutach, Christine Tung, Pieter Abbeel, Aviv Tamar

In visual planning (VP), an agent learns to plan goal-directed behavior from observations of a dynamical system obtained offline, e. g., images obtained from self-supervised robot interaction.

Generalized Hindsight for Reinforcement Learning

no code implementations NeurIPS 2020 Alexander C. Li, Lerrel Pinto, Pieter Abbeel

Compared to standard relabeling techniques, Generalized Hindsight provides a substantially more efficient reuse of samples, which we empirically demonstrate on a suite of multi-task navigation and manipulation tasks.

reinforcement-learning reinforcement Learning

BADGR: An Autonomous Self-Supervised Learning-Based Navigation System

1 code implementation13 Feb 2020 Gregory Kahn, Pieter Abbeel, Sergey Levine

Mobile robot navigation is typically regarded as a geometric problem, in which the robot's objective is to perceive the geometry of the environment in order to plan collision-free paths towards a desired goal.

Navigate Robot Navigation +1

Preventing Imitation Learning with Adversarial Policy Ensembles

no code implementations31 Jan 2020 Albert Zhan, Stas Tiomkin, Pieter Abbeel

To our knowledge, this is the first work regarding the protection of policies in Reinforcement Learning.

Imitation Learning reinforcement-learning +1

Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards

no code implementations21 Dec 2019 Xingyu Lu, Stas Tiomkin, Pieter Abbeel

While recent progress in deep reinforcement learning has enabled robots to learn complex behaviors, tasks with long horizons and sparse rewards remain an ongoing challenge.

reinforcement-learning reinforcement Learning

AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos

no code implementations10 Dec 2019 Laura Smith, Nikita Dhawan, Marvin Zhang, Pieter Abbeel, Sergey Levine

In this paper, we study how these challenges can be alleviated with an automated robotic learning framework, in which multi-stage tasks are defined simply by providing videos of a human demonstrator and then learned autonomously by the robot from raw image observations.

Translation

Learning Efficient Representation for Intrinsic Motivation

no code implementations4 Dec 2019 Ruihan Zhao, Stas Tiomkin, Pieter Abbeel

The core idea is to represent the relation between action sequences and future states using a stochastic dynamic model in latent space with a specific form.

Adaptive Online Planning for Continual Lifelong Learning

1 code implementation3 Dec 2019 Kevin Lu, Igor Mordatch, Pieter Abbeel

We study learning control in an online reset-free lifelong learning scenario, where mistakes can compound catastrophically into the future and the underlying dynamics of the environment may change.

Compositional Plan Vectors

1 code implementation NeurIPS 2019 Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine

We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training.

Imitation Learning

Natural Image Manipulation for Autoregressive Models Using Fisher Scores

no code implementations25 Nov 2019 Wilson Yan, Jonathan Ho, Pieter Abbeel

Deep autoregressive models are one of the most powerful models that exist today which achieve state-of-the-art bits per dim.

Image Manipulation

Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control

no code implementations30 Oct 2019 Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine

We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training.

Imitation Learning

Learning to Manipulate Deformable Objects without Demonstrations

2 code implementations29 Oct 2019 Yilin Wu, Wilson Yan, Thanard Kurutach, Lerrel Pinto, Pieter Abbeel

Second, instead of jointly learning both the pick and the place locations, we only explicitly learn the placing policy conditioned on random pick points.

Deformable Object Manipulation

Geometry-Aware Neural Rendering

1 code implementation NeurIPS 2019 Josh Tobin, OpenAI Robotics, Pieter Abbeel

Understanding the 3-dimensional structure of the world is a core challenge in computer vision and robotics.

Neural Rendering

Asynchronous Methods for Model-Based Reinforcement Learning

1 code implementation28 Oct 2019 Yunzhi Zhang, Ignasi Clavera, Boren Tsai, Pieter Abbeel

In this work, we propose an asynchronous framework for model-based reinforcement learning methods that brings down the run time of these algorithms to be just the data collection time.

Model-based Reinforcement Learning reinforcement-learning +1

On the Utility of Learning about Humans for Human-AI Coordination

2 code implementations NeurIPS 2019 Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan

While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves.

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

2 code implementations7 Oct 2019 Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Kurt Keutzer, Ion Stoica, Joseph E. Gonzalez

We formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies.

Dynamical System Embedding for Efficient Intrinsically Motivated Artificial Agents

no code implementations25 Sep 2019 Ruihan Zhao, Stas Tiomkin, Pieter Abbeel

In this work, we develop a novel approach for the estimation of empowerment in unknown arbitrary dynamics from visual stimulus only, without sampling for the estimation of MIAS.

PatchFormer: A neural architecture for self-supervised representation learning on images

no code implementations25 Sep 2019 Aravind Srinivas, Pieter Abbeel

In this paper, we propose a neural architecture for self-supervised representation learning on raw images called the PatchFormer which learns to model spatial dependencies across patches in a raw image.

Representation Learning Self-Supervised Learning

rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch

9 code implementations3 Sep 2019 Adam Stooke, Pieter Abbeel

rlpyt is designed as a high-throughput code base for small- to medium-scale research in deep RL.

Q-Learning reinforcement-learning +1

Likelihood Contribution based Multi-scale Architecture for Generative Flows

no code implementations5 Aug 2019 Hari Prasanna Das, Pieter Abbeel, Costas J. Spanos

Deep generative modeling using flows has gained popularity owing to the tractable exact log-likelihood estimation with efficient training and synthesis process.

Dimensionality Reduction

DoorGym: A Scalable Door Opening Environment And Baseline Agent

1 code implementation5 Aug 2019 Yusuke Urakami, Alec Hodgkinson, Casey Carlin, Randall Leu, Luca Rigazio, Pieter Abbeel

We introduce DoorGym, an open-source door opening simulation framework designed to utilize domain randomization to train a stable policy.

BagNet: Berkeley Analog Generator with Layout Optimizer Boosted with Deep Neural Networks

no code implementations23 Jul 2019 Kourosh Hakhamaneshi, Nick Werblun, Pieter Abbeel, Vladimir Stojanovic

The discrepancy between post-layout and schematic simulation results continues to widen in analog design due in part to the domination of layout parasitics.

Benchmarking Model-Based Reinforcement Learning

2 code implementations3 Jul 2019 Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba

Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL.

Benchmarking Model-based Reinforcement Learning +2