Search Results for author: Pieter Abbeel

Found 348 papers, 188 papers with code

On a Connection between Importance Sampling and the Likelihood Ratio Policy Gradient

no code implementations • NeurIPS 2010 • Tang Jie, Pieter Abbeel

Likelihood ratio policy gradient methods have been some of the most successful reinforcement learning algorithms, especially for learning on physical systems.

Policy Gradient Methods

Paper
Add Code

Safe Exploration in Markov Decision Processes

no code implementations • 22 May 2012 • Teodor Mihai Moldovan, Pieter Abbeel

We show that imposing safety by restricting attention to the resulting set of guaranteed safe policies is NP-hard.

Safe Exploration

Paper
Add Code

Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics

no code implementations • NeurIPS 2014 • Sergey Levine, Pieter Abbeel

We present a policy search method that uses iteratively refitted local linear models to optimize trajectory distributions for large, continuous problems.

Paper
Add Code

Learning Contact-Rich Manipulation Skills with Guided Policy Search

no code implementations • 22 Jan 2015 • Sergey Levine, Nolan Wagener, Pieter Abbeel

Autonomous learning of object manipulation skills can enable robots to acquire rich behavioral repertoires that scale to the variety of objects found in the real world.

Robotics

Paper
Add Code

Trust Region Policy Optimization

21 code implementations • 19 Feb 2015 • John Schulman, Sergey Levine, Philipp Moritz, Michael. I. Jordan, Pieter Abbeel

We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement.

Atari Games Policy Gradient Methods

7,909

Paper
Code

End-to-End Training of Deep Visuomotor Policies

no code implementations • 2 Apr 2015 • Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel

Policy search methods can allow robots to learn control policies for a wide range of tasks, but practical applications of policy search often require hand-engineered components for perception, state estimation, and low-level control.

Paper
Add Code

High-Dimensional Continuous Control Using Generalized Advantage Estimation

17 code implementations • 8 Jun 2015 • John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks.

Continuous Control Policy Gradient Methods +1

47,992

Paper
Code

Combinatorial Energy Learning for Image Segmentation

no code implementations • NeurIPS 2016 • Jeremy Maitin-Shepard, Viren Jain, Michal Januszewski, Peter Li, Pieter Abbeel

We introduce a new machine learning approach for image segmentation that uses a neural network to model the conditional energy of a segmentation given an image.

Image Segmentation Segmentation +1

Paper
Add Code

Gradient Estimation Using Stochastic Computation Graphs

1 code implementation • NeurIPS 2015 • John Schulman, Nicolas Heess, Theophane Weber, Pieter Abbeel

In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external world.

Paper
Code

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

1 code implementation • 3 Jul 2015 • Bradly C. Stadie, Sergey Levine, Pieter Abbeel

By parameterizing our learned model with a neural network, we are able to develop a scalable and efficient approach to exploration bonuses that can be applied to tasks with complex, high-dimensional state spaces.

Ranked #24 on Atari Games on Atari 2600 Q*Bert

Atari Games reinforcement-learning +2

Paper
Code

Learning Deep Neural Network Policies with Continuous Memory States

no code implementations • 5 Jul 2015 • Marvin Zhang, Zoe McCarthy, Chelsea Finn, Sergey Levine, Pieter Abbeel

We evaluate our method on tasks involving continuous control in manipulation and navigation settings, and show that our method can learn complex policies that successfully complete a range of tasks that require memory.

Continuous Control Memorization

Paper
Add Code

Deep Spatial Autoencoders for Visuomotor Learning

1 code implementation • 21 Sep 2015 • Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel

Our method uses a deep spatial autoencoder to acquire a set of feature points that describe the environment for the current task, such as the positions of objects, and then learns a motion skill with these feature points using an efficient reinforcement learning method based on local linear models.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search

no code implementations • 22 Sep 2015 • Tianhao Zhang, Gregory Kahn, Sergey Levine, Pieter Abbeel

We propose to combine MPC with reinforcement learning in the framework of guided policy search, where MPC is used to generate data at training time, under full state observations provided by an instrumented training environment.

Model Predictive Control reinforcement-learning +1

Paper
Add Code

Model-based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration

no code implementations • 23 Sep 2015 • Christopher Xie, Sachin Patil, Teodor Moldovan, Sergey Levine, Pieter Abbeel

In this paper, we present a robotic model-based reinforcement learning method that combines ideas from model identification and model predictive control.

Model-based Reinforcement Learning Model Predictive Control +2

Paper
Add Code

One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors

no code implementations • 23 Sep 2015 • Justin Fu, Sergey Levine, Pieter Abbeel

One of the key challenges in applying reinforcement learning to complex robotic control tasks is the need to gather large amounts of experience in order to find an effective policy for the task at hand.

Model-based Reinforcement Learning Model Predictive Control +3

Paper
Add Code

Adapting Deep Visuomotor Representations with Weak Pairwise Constraints

no code implementations • 23 Nov 2015 • Eric Tzeng, Coline Devin, Judy Hoffman, Chelsea Finn, Pieter Abbeel, Sergey Levine, Kate Saenko, Trevor Darrell

We propose a novel, more powerful combination of both distribution and pairwise image alignment, and remove the requirement for expensive annotation by using weakly aligned pairs of images in the source and target domains.

Domain Adaptation

Paper
Add Code

Inverse Reinforcement Learning via Deep Gaussian Process

no code implementations • 26 Dec 2015 • Ming Jin, Andreas Damianou, Pieter Abbeel, Costas Spanos

We propose a new approach to inverse reinforcement learning (IRL) based on the deep Gaussian process (deep GP) model, which is capable of learning complicated reward structures with few demonstrations.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Value Iteration Networks

8 code implementations • NeurIPS 2016 • Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel

We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within.

reinforcement-learning Reinforcement Learning (RL)

554

Paper
Code

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

4 code implementations • 1 Mar 2016 • Chelsea Finn, Sergey Levine, Pieter Abbeel

We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems.

Feature Engineering

2,539

Paper
Code

PLATO: Policy Learning using Adaptive Trajectory Optimization

no code implementations • 2 Mar 2016 • Gregory Kahn, Tianhao Zhang, Sergey Levine, Pieter Abbeel

PLATO also maintains the MPC cost as an objective to avoid highly undesirable actions that would result from strictly following the learned policy before it has been fully trained.

Model Predictive Control

Paper
Add Code

Learning Dexterous Manipulation for a Soft Robotic Hand from Human Demonstration

no code implementations • 21 Mar 2016 • Abhishek Gupta, Clemens Eppner, Sergey Levine, Pieter Abbeel

In this paper, we describe an approach to learning from demonstration that can be used to train soft robotic hands to perform dexterous manipulation tasks.

Paper
Add Code

Benchmarking Deep Reinforcement Learning for Continuous Control

15 code implementations • 22 Apr 2016 • Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel

Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning.

Ranked #1 on Continuous Control on Inverted Pendulum

Action Triplet Recognition Atari Games +4

2,858

Paper
Code

Backprop KF: Learning Discriminative Deterministic State Estimators

1 code implementation • NeurIPS 2016 • Tuomas Haarnoja, Anurag Ajay, Sergey Levine, Pieter Abbeel

We show that this procedure can be used to train state estimators that use complex input, such as raw camera images, which must be processed using expressive nonlinear function approximators such as convolutional neural networks.

Autonomous Vehicles Visual Odometry

Paper
Code

VIME: Variational Information Maximizing Exploration

2 code implementations • NeurIPS 2016 • Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel

While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios.

Continuous Control Reinforcement Learning (RL) +1

337

Paper
Code

Cooperative Inverse Reinforcement Learning

2 code implementations • NeurIPS 2016 • Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell

For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans.

Active Learning reinforcement-learning +1

Paper
Code

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

37 code implementations • NeurIPS 2016 • Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel

This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner.

Ranked #3 on Image Generation on Stanford Cars

Generative Adversarial Network Image Generation +3

15,701

Paper
Code

Learning to Poke by Poking: Experiential Learning of Intuitive Physics

1 code implementation • NeurIPS 2016 • Pulkit Agrawal, Ashvin Nair, Pieter Abbeel, Jitendra Malik, Sergey Levine

We investigate an experiential learning paradigm for acquiring an internal model of intuitive physics.

Decision Making

Paper
Code

Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer

no code implementations • 22 Sep 2016 • Coline Devin, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, Sergey Levine

Using deep reinforcement learning to train general purpose neural network policies alleviates some of the burden of manual representation engineering by using expressive policy classes, but exacerbates the challenge of data collection, since such methods tend to be less efficient than RL with low-dimensional, hand-designed representations.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Add Code

Deep Reinforcement Learning for Tensegrity Robot Locomotion

no code implementations • 28 Sep 2016 • Marvin Zhang, Xinyang Geng, Jonathan Bruce, Ken Caluwaerts, Massimo Vespignani, Vytas SunSpiral, Pieter Abbeel, Sergey Levine

We evaluate our method with real-world and simulated experiments on the SUPERball tensegrity robot, showing that the learned policies generalize to changes in system parameters, unreliable sensor measurements, and variation in environmental conditions, including varied terrains and a range of different gravities.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning from the Hindsight Plan -- Episodic MPC Improvement

1 code implementation • 28 Sep 2016 • Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, Pieter Abbeel

To bring the next real-world execution closer to the hindsight plan, our approach learns to re-shape the original cost function with the goal of satisfying the following property: short horizon planning (as realistic during real executions) with respect to the shaped cost should result in mimicking the hindsight plan.

Model Predictive Control

221

Paper
Code

Reset-Free Guided Policy Search: Efficient Deep Reinforcement Learning with Stochastic Initial States

no code implementations • 4 Oct 2016 • William Montgomery, Anurag Ajay, Chelsea Finn, Pieter Abbeel, Sergey Levine

Autonomous learning of robotic skills can allow general-purpose robots to learn wide behavioral repertoires without requiring extensive manual engineering.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model

no code implementations • 11 Oct 2016 • Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, Wojciech Zaremba

Nevertheless, often the overall gist of what the policy does in simulation remains valid in the real world.

Friction

Paper
Add Code

Variational Lossy Autoencoder

no code implementations • 8 Nov 2016 • Xi Chen, Diederik P. Kingma, Tim Salimans, Yan Duan, Prafulla Dhariwal, John Schulman, Ilya Sutskever, Pieter Abbeel

Representation learning seeks to expose certain aspects of observed data in a learned representation that's amenable to downstream tasks like classification.

Density Estimation Image Generation +1

Paper
Add Code

RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning

18 code implementations • 9 Nov 2016 • Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel

The activations of the RNN store the state of the "fast" RL algorithm on the current (previously unseen) MDP.

reinforcement-learning Reinforcement Learning (RL)

796

Paper
Code

A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models

3 code implementations • 11 Nov 2016 • Chelsea Finn, Paul Christiano, Pieter Abbeel, Sergey Levine

In particular, we demonstrate an equivalence between a sample-based algorithm for maximum entropy IRL and a GAN in which the generator's density can be evaluated and is provided as an additional input to the discriminator.

Imitation Learning reinforcement-learning +1

249

Paper
Code

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

3 code implementations • NeurIPS 2017 • Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel

In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks.

Ranked #1 on Atari Games on Atari 2600 Freeway

Atari Games Continuous Control +2

Paper
Code

The Off-Switch Game

no code implementations • 24 Nov 2016 • Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell

We analyze a simple game between a human H and a robot R, where H can press R's off switch but R can disable the off switch.

Paper
Add Code

Generalizing Skills with Semi-Supervised Reinforcement Learning

no code implementations • 1 Dec 2016 • Chelsea Finn, Tianhe Yu, Justin Fu, Pieter Abbeel, Sergey Levine

We evaluate our method on challenging tasks that require control directly from images, and show that our approach can improve the generalization of a learned deep neural network policy by using experience for which no reward function is available.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A K-fold Method for Baseline Estimation in Policy Gradient Algorithms

no code implementations • 3 Jan 2017 • Nithyanand Kota, Abhishek Mishra, Sunil Srinivasa, Xi, Chen, Pieter Abbeel

The high variance issue in unbiased policy-gradient methods such as VPG and REINFORCE is typically mitigated by adding a baseline.

Policy Gradient Methods

Paper
Add Code

Uncertainty-Aware Reinforcement Learning for Collision Avoidance

no code implementations • 3 Feb 2017 • Gregory Kahn, Adam Villaflor, Vitchyr Pong, Pieter Abbeel, Sergey Levine

However, practical deployment of reinforcement learning methods must contend with the fact that the training process itself can be unsafe for the robot.

Collision Avoidance Navigate +2

Paper
Add Code

Adversarial Attacks on Neural Network Policies

1 code implementation • 8 Feb 2017 • Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel

Machine learning classifiers are known to be vulnerable to inputs maliciously constructed by adversaries to force misclassification.

Paper
Code

Enabling Robots to Communicate their Objectives

no code implementations • 11 Feb 2017 • Sandy H. Huang, David Held, Pieter Abbeel, Anca D. Dragan

We show that certain approximate-inference models lead to the robot generating example behaviors that better enable users to anticipate what it will do in novel situations.

Autonomous Driving

Paper
Add Code

Reinforcement Learning with Deep Energy-Based Policies

3 code implementations • ICML 2017 • Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine

We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before.

Q-Learning reinforcement-learning +1

2,539

Paper
Code

Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation

no code implementations • 6 Mar 2017 • Ashvin Nair, Dian Chen, Pulkit Agrawal, Phillip Isola, Pieter Abbeel, Jitendra Malik, Sergey Levine

Manipulation of deformable objects, such as ropes and cloth, is an important but challenging problem in robotics.

Self-Supervised Learning

Paper
Add Code

Third-Person Imitation Learning

1 code implementation • 6 Mar 2017 • Bradly C. Stadie, Pieter Abbeel, Ilya Sutskever

A key difficulty in reinforcement learning is specifying a reward function for the agent to optimize.

Imitation Learning reinforcement-learning +1

Paper
Code

Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning

no code implementations • 8 Mar 2017 • Abhishek Gupta, Coline Devin, Yuxuan Liu, Pieter Abbeel, Sergey Levine

People can learn a wide range of tasks from their own experience, but can also learn from observing other creatures.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

82 code implementations • ICML 2017 • Chelsea Finn, Pieter Abbeel, Sergey Levine

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning.

Ranked #2 on Few-Shot Image Classification on OMNIGLOT - 5-Shot, 5-way

Few-Shot Image Classification General Classification +4

31,072

Paper
Code

Prediction and Control with Temporal Segment Models

no code implementations • ICML 2017 • Nikhil Mishra, Pieter Abbeel, Igor Mordatch

We introduce a method for learning the dynamics of complex nonlinear systems based on deep generative models over temporal segments of states and actions.

Paper
Add Code

Emergence of Grounded Compositional Language in Multi-Agent Populations

1 code implementation • 15 Mar 2017 • Igor Mordatch, Pieter Abbeel

By capturing statistical patterns in large corpora, machine learning has enabled significant advances in natural language processing, including in machine translation, question answering, and sentiment analysis.

Machine Translation Question Answering +2

Paper
Code

Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

6 code implementations • 20 Mar 2017 • Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, Pieter Abbeel

Bridging the 'reality gap' that separates simulated robotics from experiments on hardware could accelerate robotic research through improved data availability.

Object Localization

458

Paper
Code

One-Shot Imitation Learning

no code implementations • NeurIPS 2017 • Yan Duan, Marcin Andrychowicz, Bradly C. Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, Wojciech Zaremba

A neural net is trained that takes as input one demonstration and the current state (which initially is the initial state of the other demonstration of the pair), and outputs an action with the goal that the resulting sequence of states and actions matches as closely as possible with the second demonstration.

Feature Engineering Imitation Learning +1

Paper
Add Code

Learning Visual Servoing with Deep Features and Fitted Q-Iteration

2 code implementations • 31 Mar 2017 • Alex X. Lee, Sergey Levine, Pieter Abbeel

Our approach is based on servoing the camera in the space of learned visual features, rather than image pixels or manually-designed keypoints.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Stochastic Neural Networks for Hierarchical Reinforcement Learning

2 code implementations • 10 Apr 2017 • Carlos Florensa, Yan Duan, Pieter Abbeel

Then a high-level policy is trained on top of these skills, providing a significant improvement of the exploration and allowing to tackle sparse rewards in the downstream tasks.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Code

Equivalence Between Policy Gradients and Soft Q-Learning

no code implementations • 21 Apr 2017 • John Schulman, Xi Chen, Pieter Abbeel

A partial explanation may be that $Q$-learning methods are secretly implementing policy gradient updates: we show that there is a precise equivalence between $Q$-learning and policy gradient methods in the setting of entropy-regularized reinforcement learning, that "soft" (entropy-regularized) $Q$-learning is exactly equivalent to a policy gradient method.

Policy Gradient Methods Q-Learning +2

Paper
Add Code

Probabilistically Safe Policy Transfer

no code implementations • 15 May 2017 • David Held, Zoe McCarthy, Michael Zhang, Fred Shentu, Pieter Abbeel

Although learning-based methods have great potential for robotics, one concern is that a robot that updates its parameters might cause large amounts of damage before it learns the optimal policy.

Paper
Add Code

Automatic Goal Generation for Reinforcement Learning Agents

1 code implementation • ICML 2018 • Carlos Florensa, David Held, Xinyang Geng, Pieter Abbeel

Instead, we propose a method that allows an agent to automatically discover the range of tasks that it is capable of performing.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Constrained Policy Optimization

9 code implementations • ICML 2017 • Joshua Achiam, David Held, Aviv Tamar, Pieter Abbeel

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function.

Reinforcement Learning (RL) Safe Reinforcement Learning

287

Paper
Code

UCB Exploration via Q-Ensembles

no code implementations • ICLR 2018 • Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman

We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning.

Q-Learning reinforcement-learning +1

Paper
Add Code

Parameter Space Noise for Exploration

10 code implementations • ICLR 2018 • Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, Marcin Andrychowicz

Combining parameter noise with traditional RL methods allows to combine the best of both worlds.

Continuous Control reinforcement-learning +1

76,589

Paper
Code

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

84 code implementations • NeurIPS 2017 • Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch

We explore deep reinforcement learning methods for multi-agent domains.

Ranked #1 on SMAC+ on Def_Infantry_sequential

Multi-agent Reinforcement Learning Q-Learning +3

31,072

Paper
Code

Hindsight Experience Replay

26 code implementations • NeurIPS 2017 • Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba

Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL).

Reinforcement Learning (RL)

7,909

Paper
Code

Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

1 code implementation • 11 Jul 2017 • YuXuan Liu, Abhishek Gupta, Pieter Abbeel, Sergey Levine

Imitation learning is an effective approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator.

Imitation Learning Translation +1

Paper
Code

A Simple Neural Attentive Meta-Learner

4 code implementations • ICLR 2018 • Nikhil Mishra, Mostafa Rohaninejad, Xi Chen, Pieter Abbeel

Deep neural networks excel in regimes with large amounts of data, but tend to struggle when data is scarce or when they need to adapt quickly to changes in the task.

Few-Shot Image Classification Meta-Learning

143

Paper
Code

Reverse Curriculum Generation for Reinforcement Learning

no code implementations • 17 Jul 2017 • Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, Pieter Abbeel

The robot is trained in reverse, gradually learning to reach the goal from a set of start states increasingly far from the goal.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Mutual Alignment Transfer Learning

no code implementations • 25 Jul 2017 • Markus Wulfmeier, Ingmar Posner, Pieter Abbeel

Training robots for operation in the real world is a complex, time consuming and potentially expensive task.

Transfer Learning

Paper
Add Code

Deep Object-Centric Representations for Generalizable Robot Learning

1 code implementation • 14 Aug 2017 • Coline Devin, Pieter Abbeel, Trevor Darrell, Sergey Levine

We devise an object-level attentional mechanism that can be used to determine relevant objects from a few trajectories or demonstrations, and then immediately incorporate those objects into a learned policy.

Object Reinforcement Learning (RL)

Paper
Code

Learning Generalized Reactive Policies using Deep Neural Networks

no code implementations • 24 Aug 2017 • Edward Groshev, Maxwell Goldstein, Aviv Tamar, Siddharth Srivastava, Pieter Abbeel

We show that a deep neural network can be used to learn and represent a \emph{generalized reactive policy} (GRP) that maps a problem instance and a state to an action, and that the learned GRPs efficiently solve large classes of challenging problem instances.

Decision Making feature selection

Paper
Add Code

Learning with Opponent-Learning Awareness

6 code implementations • 13 Sep 2017 • Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch

We also show that the LOLA update rule can be efficiently calculated using an extension of the policy gradient estimator, making the method suitable for model-free RL.

Multi-agent Reinforcement Learning

137

Paper
Code

One-Shot Visual Imitation Learning via Meta-Learning

3 code implementations • 14 Sep 2017 • Chelsea Finn, Tianhe Yu, Tianhao Zhang, Pieter Abbeel, Sergey Levine

In this work, we present a meta-imitation learning method that enables a robot to learn how to learn more efficiently, allowing it to acquire new skills from just a single demonstration.

Imitation Learning Meta-Learning

282

Paper
Code

Overcoming Exploration in Reinforcement Learning with Demonstrations

3 code implementations • 28 Sep 2017 • Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel

Exploration in environments with sparse rewards has been a persistent problem in reinforcement learning (RL).

Continuous Control Reinforcement Learning (RL)

815

Paper
Code

Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

2 code implementations • 29 Sep 2017 • Gregory Kahn, Adam Villaflor, Bosen Ding, Pieter Abbeel, Sergey Levine

To address the need to learn complex policies with few samples, we propose a generalized computation graph that subsumes value-based model-free methods and model-based methods, with specific instantiations interpolating between model-free and model-based.

Navigate Q-Learning +3

Paper
Code

Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments

1 code implementation • ICLR 2018 • Maruan Al-Shedivat, Trapit Bansal, Yuri Burda, Ilya Sutskever, Igor Mordatch, Pieter Abbeel

Ability to continuously learn and adapt from limited experience in nonstationary environments is an important milestone on the path towards general intelligence.

Meta-Learning

295

Paper
Code

Synkhronos: a Multi-GPU Theano Extension for Data Parallelism

1 code implementation • 11 Oct 2017 • Adam Stooke, Pieter Abbeel

We present Synkhronos, an extension to Theano for multi-GPU computations leveraging data parallelism.

Paper
Code

Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation

3 code implementations • 12 Oct 2017 • Tianhao Zhang, Zoe McCarthy, Owen Jow, Dennis Lee, Xi Chen, Ken Goldberg, Pieter Abbeel

Imitation learning is a powerful paradigm for robot skill acquisition.

Imitation Learning

Paper
Code

Domain Randomization and Generative Models for Robotic Grasping

no code implementations • 17 Oct 2017 • Joshua Tobin, Lukas Biewald, Rocky Duan, Marcin Andrychowicz, Ankur Handa, Vikash Kumar, Bob McGrew, Jonas Schneider, Peter Welinder, Wojciech Zaremba, Pieter Abbeel

In this work, we explore a novel data generation pipeline for training a deep neural network to perform grasp planning that applies the idea of domain randomization to object synthesis.

Object Robotic Grasping

Paper
Add Code

Sim-to-Real Transfer of Robotic Control with Dynamics Randomization

no code implementations • 18 Oct 2017 • Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, Pieter Abbeel

By randomizing the dynamics of the simulator during training, we are able to develop policies that are capable of adapting to very different dynamics, including ones that differ significantly from the dynamics on which the policies were trained.

Robotics Systems and Control

Paper
Add Code

Asymmetric Actor Critic for Image-Based Robot Learning

no code implementations • 18 Oct 2017 • Lerrel Pinto, Marcin Andrychowicz, Peter Welinder, Wojciech Zaremba, Pieter Abbeel

While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working with a simulator.

Decision Making Reinforcement Learning (RL)

Paper
Add Code

Meta Learning Shared Hierarchies

3 code implementations • ICLR 2018 • Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman

We develop a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives---policies that are executed for large numbers of timesteps.

Meta-Learning

606

Paper
Code

Interpretable and Pedagogical Examples

no code implementations • ICLR 2018 • Smitha Milli, Pieter Abbeel, Igor Mordatch

Teachers intentionally pick the most informative examples to show their students.

Paper
Add Code

Inverse Reward Design

1 code implementation • NeurIPS 2017 • Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, Anca Dragan

When designing the reward, we might think of some specific training scenarios, and make sure that the reward will lead to the right behavior in those scenarios.

Paper
Code

Safer Classification by Synthesis

no code implementations • 22 Nov 2017 • William Wang, Angelina Wang, Aviv Tamar, Xi Chen, Pieter Abbeel

We posit that a generative approach is the natural remedy for this problem, and propose a method for classification using generative models.

Classification General Classification

Paper
Add Code

A Berkeley View of Systems Challenges for AI

no code implementations • 15 Dec 2017 • Ion Stoica, Dawn Song, Raluca Ada Popa, David Patterson, Michael W. Mahoney, Randy Katz, Anthony D. Joseph, Michael Jordan, Joseph M. Hellerstein, Joseph E. Gonzalez, Ken Goldberg, Ali Ghodsi, David Culler, Pieter Abbeel

With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production.

Machine Translation speech-recognition +1

Paper
Add Code

PixelSNAIL: An Improved Autoregressive Generative Model

6 code implementations • ICML 2018 • Xi Chen, Nikhil Mishra, Mostafa Rohaninejad, Pieter Abbeel

Autoregressive generative models consistently achieve the best results in density estimation tasks involving high dimensional data, such as images or audio.

Density Estimation Image Generation +1

403

Paper
Code

Self-Supervised Learning of Object Motion Through Adversarial Video Prediction

no code implementations • ICLR 2018 • Alex X. Lee, Frederik Ebert, Richard Zhang, Chelsea Finn, Pieter Abbeel, Sergey Levine

In this paper, we study the problem of multi-step video prediction, where the goal is to predict a sequence of future frames conditioned on a short context.

Object Self-Supervised Learning +1

Paper
Add Code

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

76 code implementations • ICML 2018 • Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine

A platform for Applied Reinforcement Learning (Applied RL)

Ranked #1 on Continuous Control on Lunar Lander (OpenAI Gym)

Continuous Control Decision Making +3

31,072

Paper
Code

One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning

2 code implementations • 5 Feb 2018 • Tianhe Yu, Chelsea Finn, Annie Xie, Sudeep Dasari, Tianhao Zhang, Pieter Abbeel, Sergey Levine

Humans and animals are capable of learning a new behavior by observing others perform the skill just once.

Meta-Learning One-Shot Learning

282

Paper
Code

Evolved Policy Gradients

3 code implementations • NeurIPS 2018 • Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie, Filip Wolski, Jonathan Ho, Pieter Abbeel

We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms.

Reinforcement Learning (RL)

244

Paper
Code

Meta-Reinforcement Learning of Structured Exploration Strategies

2 code implementations • NeurIPS 2018 • Abhishek Gupta, Russell Mendonca, Yuxuan Liu, Pieter Abbeel, Sergey Levine

Exploration is a fundamental challenge in reinforcement learning (RL).

Meta Reinforcement Learning reinforcement-learning +1

Paper
Code

Model-Ensemble Trust-Region Policy Optimization

2 code implementations • ICLR 2018 • Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel

In this paper, we analyze the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and show that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training.

Continuous Control Model-based Reinforcement Learning +2

Paper
Code

Some Considerations on Learning to Explore via Meta-Reinforcement Learning

7 code implementations • ICLR 2018 • Bradly C. Stadie, Ge Yang, Rein Houthooft, Xi Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever

We consider the problem of exploration in meta reinforcement learning.

Meta Reinforcement Learning reinforcement-learning +1

224

Paper
Code

Accelerated Methods for Deep Reinforcement Learning

8 code implementations • 7 Mar 2018 • Adam Stooke, Pieter Abbeel

Deep reinforcement learning (RL) has achieved many recent successes, yet experiment turn-around time remains a key bottleneck in research and in practice.

Atari Games reinforcement-learning +1

2,198

Paper
Code

Composable Deep Reinforcement Learning for Robotic Manipulation

1 code implementation • 19 Mar 2018 • Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine

Second, we show that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies.

Q-Learning reinforcement-learning +1

409

Paper
Code

Learning Robotic Assembly from CAD

no code implementations • 20 Mar 2018 • Garrett Thomas, Melissa Chien, Aviv Tamar, Juan Aparicio Ojea, Pieter Abbeel

We propose to leverage this prior knowledge by guiding RL along a geometric motion plan, calculated using the CAD data.

Motion Planning Reinforcement Learning (RL)

Paper
Add Code

Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines

no code implementations • ICLR 2018 • Cathy Wu, Aravind Rajeswaran, Yan Duan, Vikash Kumar, Alexandre M. Bayen, Sham Kakade, Igor Mordatch, Pieter Abbeel

To mitigate this issue, we derive a bias-free action-dependent baseline for variance reduction which fully exploits the structural form of the stochastic policy itself and does not make any additional assumptions about the MDP.

Policy Gradient Methods reinforcement-learning +1

Paper
Add Code

Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning

2 code implementations • ICLR 2019 • Anusha Nagabandi, Ignasi Clavera, Simin Liu, Ronald S. Fearing, Pieter Abbeel, Sergey Levine, Chelsea Finn

Although reinforcement learning methods can achieve impressive results in simulation, the real world presents two major challenges: generating samples is exceedingly expensive, and unexpected perturbations or unseen situations cause proficient but specialized policies to fail at test time.

Continuous Control Meta-Learning +5

205

Paper
Code

Universal Planning Networks

1 code implementation • 2 Apr 2018 • Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, Chelsea Finn

We find that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images.

Imitation Learning Representation Learning +1

Paper
Code

Stochastic Adversarial Video Prediction

4 code implementations • ICLR 2019 • Alex X. Lee, Richard Zhang, Frederik Ebert, Pieter Abbeel, Chelsea Finn, Sergey Levine

However, learning to predict raw future observations, such as frames in a video, is exceedingly challenging -- the ambiguous nature of the problem can cause a naively designed model to average together possible futures into a single, blurry prediction.

Ranked #1 on Video Prediction on KTH (Cond metric)

Representation Learning Video Generation +1

300

Paper
Code

DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills

6 code implementations • 8 Apr 2018 • Xue Bin Peng, Pieter Abbeel, Sergey Levine, Michiel Van de Panne

We further explore a number of methods for integrating multiple clips into the learning process to develop multi-skilled agents capable of performing a rich repertoire of diverse skills.

Motion Synthesis reinforcement-learning +1

2,183

Paper
Code

Latent Space Policies for Hierarchical Reinforcement Learning

no code implementations • ICML 2018 • Tuomas Haarnoja, Kristian Hartikainen, Pieter Abbeel, Sergey Levine

In contrast to methods that explicitly restrict or cripple lower layers of a hierarchy to force them to use higher-level modulating signals, each layer in our framework is trained to directly solve the task, but acquires a range of diverse strategies via a maximum entropy reinforcement learning objective.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Add Code

The Limits and Potentials of Deep Learning for Robotics

no code implementations • 18 Apr 2018 • Niko Sünderhauf, Oliver Brock, Walter Scheirer, Raia Hadsell, Dieter Fox, Jürgen Leitner, Ben Upcroft, Pieter Abbeel, Wolfram Burgard, Michael Milford, Peter Corke

In this paper we discuss a number of robotics-specific learning, reasoning, and embodiment challenges for deep learning.

Robotics

Paper
Add Code

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings

no code implementations • ICML 2018 • John D. Co-Reyes, Yuxuan Liu, Abhishek Gupta, Benjamin Eysenbach, Pieter Abbeel, Sergey Levine

We show that we can learn continuous latent representations of trajectories, which are effective in solving temporally extended and multi-stage problems.

Hierarchical Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control

1 code implementation • ICML 2018 • Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, Chelsea Finn

A key challenge in complex visuomotor control is learning abstract representations that are effective for specifying goals, planning, and generalization.

Imitation Learning

Paper
Code

Learning Plannable Representations with Causal InfoGAN

1 code implementation • NeurIPS 2018 • Thanard Kurutach, Aviv Tamar, Ge Yang, Stuart Russell, Pieter Abbeel

Finally, to generate a visual plan, we project the current and goal observations onto their respective states in the planning model, plan a trajectory, and then use the generative model to transform the trajectory to a sequence of observations.

Representation Learning

Paper
Code

Variational Option Discovery Algorithms

no code implementations • 26 Jul 2018 • Joshua Achiam, Harrison Edwards, Dario Amodei, Pieter Abbeel

We explore methods for option discovery based on variational inference and make two algorithmic contributions.

Variational Inference

Paper
Add Code

Transfer Learning for Estimating Causal Effects using Neural Networks

no code implementations • 23 Aug 2018 • Sören R. Künzel, Bradly C. Stadie, Nikita Vemuri, Varsha Ramakrishnan, Jasjeet S. Sekhon, Pieter Abbeel

We develop new algorithms for estimating heterogeneous treatment effects, combining recent developments in transfer learning for neural networks with insights from the causal inference literature.

Causal Inference Transfer Learning

Paper
Add Code

SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning

1 code implementation • ICLR 2019 • Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew J. Johnson, Sergey Levine

Model-based reinforcement learning (RL) has proven to be a data efficient approach for learning control tasks but is difficult to utilize in domains with complex observations such as images.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Code

Model-Based Reinforcement Learning via Meta-Policy Optimization

1 code implementation • 14 Sep 2018 • Ignasi Clavera, Jonas Rothfuss, John Schulman, Yasuhiro Fujita, Tamim Asfour, Pieter Abbeel

Finally, we demonstrate that our approach is able to match the asymptotic performance of model-free methods while requiring significantly less experience.

Model-based Reinforcement Learning reinforcement-learning +1

31,072

Paper
Code

Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow

5 code implementations • ICLR 2019 • Xue Bin Peng, Angjoo Kanazawa, Sam Toyer, Pieter Abbeel, Sergey Levine

By enforcing a constraint on the mutual information between the observations and the discriminator's internal representation, we can effectively modulate the discriminator's accuracy and maintain useful and informative gradients.

Continuous Control Image Generation +1

627

Paper
Code

SFV: Reinforcement Learning of Physical Skills from Videos

1 code implementation • 8 Oct 2018 • Xue Bin Peng, Angjoo Kanazawa, Jitendra Malik, Pieter Abbeel, Sergey Levine

In this paper, we propose a method that enables physically simulated characters to learn skills from videos (SFV).

Pose Estimation reinforcement-learning +1

305

Paper
Code

ProMP: Proximal Meta-Policy Search

6 code implementations • ICLR 2019 • Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamim Asfour, Pieter Abbeel

Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood.

Meta-Learning Meta Reinforcement Learning

2,536

Paper
Code

Composable Action-Conditioned Predictors: Flexible Off-Policy Learning for Robot Navigation

1 code implementation • 16 Oct 2018 • Gregory Kahn, Adam Villaflor, Pieter Abbeel, Sergey Levine

We show that a simulated robotic car and a real-world RC car can gather data and train fully autonomously without any human-provided labels beyond those needed to train the detectors, and then at test-time be able to accomplish a variety of different tasks.

Robot Navigation

Paper
Code

Establishing Appropriate Trust via Critical States

no code implementations • 18 Oct 2018 • Sandy H. Huang, Kush Bhatia, Pieter Abbeel, Anca D. Dragan

In order to effectively interact with or supervise a robot, humans need to have an accurate mental model of its capabilities and how it acts.

Robotics

Paper
Add Code

One-Shot Hierarchical Imitation Learning of Compound Visuomotor Tasks

no code implementations • 25 Oct 2018 • Tianhe Yu, Pieter Abbeel, Sergey Levine, Chelsea Finn

We consider the problem of learning multi-stage vision-based tasks on a real robot from a single video of a human performing the task, while leveraging demonstration data of subtasks with other objects.

Imitation Learning

Paper
Add Code

Modular Architecture for StarCraft II with Deep Reinforcement Learning

no code implementations • 8 Nov 2018 • Dennis Lee, Haoran Tang, Jeffrey O. Zhang, Huazhe Xu, Trevor Darrell, Pieter Abbeel

We present a novel modular architecture for StarCraft II AI.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Add Code

An Algorithmic Perspective on Imitation Learning

no code implementations • 16 Nov 2018 • Takayuki Osa, Joni Pajarinen, Gerhard Neumann, J. Andrew Bagnell, Pieter Abbeel, Jan Peters

This process of learning from demonstrations, and the study of algorithms to do so, is called imitation learning.

Imitation Learning Learning Theory

Paper
Add Code

Guiding Policies with Language via Meta-Learning

1 code implementation • ICLR 2019 • John D. Co-Reyes, Abhishek Gupta, Suvansh Sanjeev, Nick Altieri, Jacob Andreas, John DeNero, Pieter Abbeel, Sergey Levine

However, a single instruction may be insufficient to fully communicate our intent or, even if it is, may be insufficient for an autonomous agent to actually understand how to perform the desired task.

Imitation Learning Instruction Following +1

Paper
Code

The Importance of Sampling inMeta-Reinforcement Learning

no code implementations • NeurIPS 2018 • Bradly Stadie, Ge Yang, Rein Houthooft, Peter Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever

Results are presented on a new environment we call `Krazy World': a difficult high-dimensional gridworld which is designed to highlight the importance of correctly differentiating through sampling distributions in meta-reinforcement learning.

Meta Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Soft Actor-Critic Algorithms and Applications

50 code implementations • 13 Dec 2018 • Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Decision Making reinforcement-learning +1

10,369

Paper
Code

Addressing Sample Complexity in Visual Tasks Using HER and Hallucinatory GANs

2 code implementations • NeurIPS 2019 • Himanshu Sahni, Toby Buckley, Pieter Abbeel, Ilya Kuzovkin

In this work, we show how visual trajectories can be hallucinated to appear successful by altering agent observations using a generative model trained on relatively few snapshots of the goal.

Reinforcement Learning (RL)

671

Paper
Code

Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design

4 code implementations • ICLR 2019 • Jonathan Ho, Xi Chen, Aravind Srinivas, Yan Duan, Pieter Abbeel

Flow-based generative models are powerful exact likelihood models with efficient sampling and inference.

Ranked #14 on Image Generation on ImageNet 32x32 (bpd metric)

Computational Efficiency Density Estimation +1

182

Paper
Code

Generalization through Simulation: Integrating Simulated and Real Data into Deep Reinforcement Learning for Vision-Based Autonomous Flight

1 code implementation • 11 Feb 2019 • Katie Kang, Suneel Belkhale, Gregory Kahn, Pieter Abbeel, Sergey Levine

Deep reinforcement learning provides a promising approach for vision-based control of real-world robots.

Collision Avoidance reinforcement-learning +1

Paper
Code

Preferences Implicit in the State of the World

1 code implementation • ICLR 2019 • Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan

We find that information from the initial state can be used to infer both side effects that should be avoided as well as preferences for how the environment should be organized.

Reinforcement Learning (RL)

Paper
Code

Domain Randomization for Active Pose Estimation

no code implementations • 10 Mar 2019 • Xinyi Ren, Jianlan Luo, Eugen Solowjow, Juan Aparicio Ojea, Abhishek Gupta, Aviv Tamar, Pieter Abbeel

In this work, we investigate how to improve the accuracy of domain randomization based pose estimation.

Pose Estimation

Paper
Add Code

Towards Characterizing Divergence in Deep Q-Learning

no code implementations • 21 Mar 2019 • Joshua Achiam, Ethan Knight, Pieter Abbeel

Deep Q-Learning (DQL), a family of temporal difference algorithms for control, employs three techniques collectively known as the `deadly triad' in reinforcement learning: bootstrapping, off-policy learning, and function approximation.

Continuous Control OpenAI Gym +1

Paper
Add Code

Guided Meta-Policy Search

no code implementations • NeurIPS 2019 • Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn

Reinforcement learning (RL) algorithms have demonstrated promising results on complex tasks, yet often require impractical numbers of samples since they learn from scratch.

Continuous Control Imitation Learning +4

Paper
Add Code

Learning to Reinforcement Learn by Imitation

no code implementations • ICLR 2019 • Rosen Kralev, Russell Mendonca, Alvin Zhang, Tianhe Yu, Abhishek Gupta, Pieter Abbeel, Sergey Levine, Chelsea Finn

Meta-reinforcement learning aims to learn fast reinforcement learning (RL) procedures that can be applied to new tasks or environments.

Meta-Learning Meta Reinforcement Learning +2

Paper
Add Code

Deep Unsupervised Cardinality Estimation

1 code implementation • 10 May 2019 • Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Xi Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, Ion Stoica

To produce a truly usable estimator, we develop a Monte Carlo integration scheme on top of autoregressive models that can efficiently handle range queries with dozens of dimensions or more.

Density Estimation

Paper
Code

Learning Robotic Manipulation through Visual Planning and Acting

no code implementations • 11 May 2019 • Angelina Wang, Thanard Kurutach, Kara Liu, Pieter Abbeel, Aviv Tamar

We further demonstrate our approach on learning to imagine and execute in 3 environments, the final of which is deformable rope manipulation on a PR2 robot.

Visual Tracking

Paper
Add Code

Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules

3 code implementations • 14 May 2019 • Daniel Ho, Eric Liang, Ion Stoica, Pieter Abbeel, Xi Chen

A key challenge in leveraging data augmentation for neural network training is choosing an effective augmentation policy from a large search space of candidate operations.

Ranked #5 on Image Classification on SVHN

Image Augmentation

503

Paper
Code

Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables

1 code implementation • 16 May 2019 • Friso H. Kingma, Pieter Abbeel, Jonathan Ho

The bits-back argument suggests that latent variable models can be turned into lossless compression schemes.

258

Paper
Code

Compression with Flows via Local Bits-Back Coding

1 code implementation • NeurIPS 2019 • Jonathan Ho, Evan Lohn, Pieter Abbeel

Likelihood-based generative models are the backbones of lossless compression due to the guaranteed existence of codes with lengths close to negative log likelihood.

Computational Efficiency

Paper
Code

MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies

1 code implementation • NeurIPS 2019 • Xue Bin Peng, Michael Chang, Grace Zhang, Pieter Abbeel, Sergey Levine

In this work, we propose multiplicative compositional policies (MCP), a method for learning reusable motor skills that can be composed to produce a range of complex behaviors.

Continuous Control

Paper
Code

Learning latent state representation for speeding up exploration

no code implementations • 27 May 2019 • Giulia Vezzani, Abhishek Gupta, Lorenzo Natale, Pieter Abbeel

In this work, we take a representation learning viewpoint on exploration, utilizing prior experience to learn effective latent representations, which can subsequently indicate which regions to explore.

Representation Learning

Paper
Add Code

Sub-policy Adaptation for Hierarchical Reinforcement Learning

no code implementations • ICLR 2020 • Alexander C. Li, Carlos Florensa, Ignasi Clavera, Pieter Abbeel

Hierarchical reinforcement learning is a promising approach to tackle long-horizon decision-making problems with sparse rewards.

Decision Making Hierarchical Reinforcement Learning +2

Paper
Add Code

Goal-conditioned Imitation Learning

1 code implementation • NeurIPS 2019 • Yiming Ding, Carlos Florensa, Mariano Phielipp, Pieter Abbeel

Designing rewards for Reinforcement Learning (RL) is challenging because it needs to convey the desired task, be efficient to optimize, and be easy to compute.

Imitation Learning Reinforcement Learning (RL)

Paper
Code

Evaluating Protein Transfer Learning with TAPE

5 code implementations • NeurIPS 2019 • Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, Yun S. Song

Semi-supervised learning has emerged as an important paradigm in protein modeling due to the high cost of acquiring supervised protein labels, but the current literature is fragmented when it comes to datasets and standardized evaluation techniques.

BIG-bench Machine Learning Representation Learning +1

628

Paper
Code

On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference

no code implementations • 23 Jun 2019 • Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca D. Dragan

But in the era of deep learning, a natural suggestion researchers make is to avoid mathematical models of human behavior that are fraught with specific assumptions, and instead use a purely data-driven approach.

Paper
Add Code

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model

8 code implementations • NeurIPS 2020 • Alex X. Lee, Anusha Nagabandi, Pieter Abbeel, Sergey Levine

Deep reinforcement learning (RL) algorithms can use high-capacity deep networks to learn directly from image observations.

Continuous Control reinforcement-learning +2

147

Paper
Code

Benchmarking Model-Based Reinforcement Learning

2 code implementations • 3 Jul 2019 • Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba

Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL.

Benchmarking Model-based Reinforcement Learning +3

Paper
Code

BagNet: Berkeley Analog Generator with Layout Optimizer Boosted with Deep Neural Networks

no code implementations • 23 Jul 2019 • Kourosh Hakhamaneshi, Nick Werblun, Pieter Abbeel, Vladimir Stojanovic

The discrepancy between post-layout and schematic simulation results continues to widen in analog design due in part to the domination of layout parasitics.

Paper
Add Code

Likelihood Contribution based Multi-scale Architecture for Generative Flows

no code implementations • 5 Aug 2019 • Hari Prasanna Das, Pieter Abbeel, Costas J. Spanos

Deep generative modeling using flows has gained popularity owing to the tractable exact log-likelihood estimation with efficient training and synthesis process.

Dimensionality Reduction

Paper
Add Code

DoorGym: A Scalable Door Opening Environment And Baseline Agent

1 code implementation • 5 Aug 2019 • Yusuke Urakami, Alec Hodgkinson, Casey Carlin, Randall Leu, Luca Rigazio, Pieter Abbeel

We introduce DoorGym, an open-source door opening simulation framework designed to utilize domain randomization to train a stable policy.

Reinforcement Learning (RL)

102

Paper
Code

rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch

9 code implementations • 3 Sep 2019 • Adam Stooke, Pieter Abbeel

rlpyt is designed as a high-throughput code base for small- to medium-scale research in deep RL.

Q-Learning reinforcement-learning +1

2,198

Paper
Code

Dynamical System Embedding for Efficient Intrinsically Motivated Artificial Agents

no code implementations • 25 Sep 2019 • Ruihan Zhao, Stas Tiomkin, Pieter Abbeel

In this work, we develop a novel approach for the estimation of empowerment in unknown arbitrary dynamics from visual stimulus only, without sampling for the estimation of MIAS.

Paper
Add Code

PatchFormer: A neural architecture for self-supervised representation learning on images

no code implementations • 25 Sep 2019 • Aravind Srinivas, Pieter Abbeel

In this paper, we propose a neural architecture for self-supervised representation learning on raw images called the PatchFormer which learns to model spatial dependencies across patches in a raw image.

Representation Learning Self-Supervised Learning

Paper
Add Code

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

2 code implementations • 7 Oct 2019 • Paras Jain, Ajay Jain, Aniruddha Nrusimha, Amir Gholami, Pieter Abbeel, Kurt Keutzer, Ion Stoica, Joseph E. Gonzalez

We formalize the problem of trading-off DNN training time and memory requirements as the tensor rematerialization optimization problem, a generalization of prior checkpointing strategies.

125

Paper
Code

On the Utility of Learning about Humans for Human-AI Coordination

2 code implementations • NeurIPS 2019 • Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan

While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves.

635

Paper
Code

Geometry-Aware Neural Rendering

1 code implementation • NeurIPS 2019 • Josh Tobin, OpenAI Robotics, Pieter Abbeel

Understanding the 3-dimensional structure of the world is a core challenge in computer vision and robotics.

Neural Rendering

Paper
Code

Asynchronous Methods for Model-Based Reinforcement Learning

1 code implementation • 28 Oct 2019 • Yunzhi Zhang, Ignasi Clavera, Boren Tsai, Pieter Abbeel

In this work, we propose an asynchronous framework for model-based reinforcement learning methods that brings down the run time of these algorithms to be just the data collection time.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Code

Learning to Manipulate Deformable Objects without Demonstrations

2 code implementations • 29 Oct 2019 • Yilin Wu, Wilson Yan, Thanard Kurutach, Lerrel Pinto, Pieter Abbeel

Second, instead of jointly learning both the pick and the place locations, we only explicitly learn the placing policy conditioned on random pick points.

Deformable Object Manipulation Object +1

Paper
Code

Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control

no code implementations • 30 Oct 2019 • Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine

We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training.

Imitation Learning

Paper
Add Code

Natural Image Manipulation for Autoregressive Models Using Fisher Scores

no code implementations • 25 Nov 2019 • Wilson Yan, Jonathan Ho, Pieter Abbeel

Deep autoregressive models are one of the most powerful models that exist today which achieve state-of-the-art bits per dim.

Image Manipulation

Paper
Add Code

Compositional Plan Vectors

1 code implementation • NeurIPS 2019 • Coline Devin, Daniel Geng, Pieter Abbeel, Trevor Darrell, Sergey Levine

Imitation Learning

Paper
Code

Adaptive Online Planning for Continual Lifelong Learning

1 code implementation • 3 Dec 2019 • Kevin Lu, Igor Mordatch, Pieter Abbeel

We study learning control in an online reset-free lifelong learning scenario, where mistakes can compound catastrophically into the future and the underlying dynamics of the environment may change.

Paper
Code

Learning Efficient Representation for Intrinsic Motivation

no code implementations • 4 Dec 2019 • Ruihan Zhao, Stas Tiomkin, Pieter Abbeel

The core idea is to represent the relation between action sequences and future states using a stochastic dynamic model in latent space with a specific form.

Paper
Add Code

AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos

no code implementations • 10 Dec 2019 • Laura Smith, Nikita Dhawan, Marvin Zhang, Pieter Abbeel, Sergey Levine

In this paper, we study how these challenges can be alleviated with an automated robotic learning framework, in which multi-stage tasks are defined simply by providing videos of a human demonstrator and then learned autonomously by the robot from raw image observations.

Reinforcement Learning (RL) Translation

Paper
Add Code

Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards

no code implementations • 21 Dec 2019 • Xingyu Lu, Stas Tiomkin, Pieter Abbeel

While recent progress in deep reinforcement learning has enabled robots to learn complex behaviors, tasks with long horizons and sparse rewards remain an ongoing challenge.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Hierarchical Variational Imitation Learning of Control Programs

1 code implementation • 29 Dec 2019 • Roy Fox, Richard Shin, William Paul, Yitian Zou, Dawn Song, Ken Goldberg, Pieter Abbeel, Ion Stoica

Autonomous agents can learn by imitating teacher demonstrations of the intended behavior.

Imitation Learning Variational Inference

Paper
Code

Preventing Imitation Learning with Adversarial Policy Ensembles

no code implementations • 31 Jan 2020 • Albert Zhan, Stas Tiomkin, Pieter Abbeel

To our knowledge, this is the first work regarding the protection of policies in Reinforcement Learning.

Imitation Learning reinforcement-learning +1

Paper
Add Code

Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning

no code implementations • 5 Feb 2020 • Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu

In reinforcement learning, an agent learns to reach a set of goals by means of an external reward signal.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

BADGR: An Autonomous Self-Supervised Learning-Based Navigation System

1 code implementation • 13 Feb 2020 • Gregory Kahn, Pieter Abbeel, Sergey Levine

Mobile robot navigation is typically regarded as a geometric problem, in which the robot's objective is to perceive the geometry of the environment in order to plan collision-free paths towards a desired goal.

Navigate Robot Navigation +1

140

Paper
Code

GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal Black Box Constraint Satisfaction

no code implementations • 17 Feb 2020 • Kourosh Hakhamaneshi, Keertana Settaluri, Pieter Abbeel, Vladimir Stojanovic

In this work we present a new method of black-box optimization and constraint satisfaction.

Policy Gradient Methods

Paper
Add Code

Generalized Hindsight for Reinforcement Learning

no code implementations • NeurIPS 2020 • Alexander C. Li, Lerrel Pinto, Pieter Abbeel

Compared to standard relabeling techniques, Generalized Hindsight provides a substantially more efficient reuse of samples, which we empirically demonstrate on a suite of multi-task navigation and manipulation tasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Hallucinative Topological Memory for Zero-Shot Visual Planning

1 code implementation • ICML 2020 • Kara Liu, Thanard Kurutach, Christine Tung, Pieter Abbeel, Aviv Tamar

In visual planning (VP), an agent learns to plan goal-directed behavior from observations of a dynamical system obtained offline, e. g., images obtained from self-supervised robot interaction.

Paper
Code

Hierarchically Decoupled Imitation for Morphological Transfer

1 code implementation • 3 Mar 2020 • Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto

Learning long-range behaviors on complex high-dimensional agents is a fundamental problem in robot learning.

Paper
Code

Learning Predictive Representations for Deformable Objects Using Contrastive Estimation

1 code implementation • 11 Mar 2020 • Wilson Yan, Ashwin Vangipuram, Pieter Abbeel, Lerrel Pinto

Using visual model-based learning for deformable object manipulation is challenging due to difficulties in learning plannable visual representations along with complex dynamic models.

Deformable Object Manipulation

Paper
Code

Sparse Graphical Memory for Robust Planning

1 code implementation • NeurIPS 2020 • Scott Emmons, Ajay Jain, Michael Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak

To operate effectively in the real world, agents should be able to act from high-dimensional raw sensory input such as images and achieve diverse goals across long time-horizons.

Imitation Learning Visual Navigation

Paper
Code

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

7 code implementations • 8 Apr 2020 • Aravind Srinivas, Michael Laskin, Pieter Abbeel

On the DeepMind Control Suite, CURL is the first image-based algorithm to nearly match the sample-efficiency of methods that use state-based features.

Ranked #1 on Continuous Control on Finger, spin (DMControl500k)

Atari Games Atari Games 100k +4

2,539

Paper
Code

Reinforcement Learning with Augmented Data

2 code implementations • NeurIPS 2020 • Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas

To this end, we present Reinforcement Learning with Augmented Data (RAD), a simple plug-and-play module that can enhance most RL algorithms.

Data Augmentation OpenAI Gym +2

397

Paper
Code

Plan2Vec: Unsupervised Representation Learning by Latent Plans

1 code implementation • 7 May 2020 • Ge Yang, Amy Zhang, Ari S. Morcos, Joelle Pineau, Pieter Abbeel, Roberto Calandra

In this paper we introduce plan2vec, an unsupervised representation learning approach that is inspired by reinforcement learning.

Motion Planning reinforcement-learning +2

Paper
Code

Planning to Explore via Self-Supervised World Models

4 code implementations • 12 May 2020 • Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak

Reinforcement learning allows solving complex tasks, however, the learning tends to be task-specific and the sample efficiency remains a challenge.

reinforcement-learning Reinforcement Learning (RL)

208

Paper
Code

Model-Augmented Actor-Critic: Backpropagating through Paths

no code implementations • ICLR 2020 • Ignasi Clavera, Violet Fu, Pieter Abbeel

Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator to augment the data for policy optimization or value function learning.

Model-based Reinforcement Learning

Paper
Add Code

Mutual Information Maximization for Robust Plannable Representations

no code implementations • 16 May 2020 • Yiming Ding, Ignasi Clavera, Pieter Abbeel

The later, while they present low sample complexity, they learn latent spaces that need to reconstruct every single detail of the scene.

Model-based Reinforcement Learning

Paper
Add Code

Automatic Curriculum Learning through Value Disagreement

1 code implementation • NeurIPS 2020 • Yunzhi Zhang, Pieter Abbeel, Lerrel Pinto

Our key insight is that if we can sample goals at the frontier of the set of goals that an agent is able to reach, it will provide a significantly stronger learning signal compared to randomly sampled goals.

Reinforcement Learning (RL)

Paper
Code

Denoising Diffusion Probabilistic Models

61 code implementations • NeurIPS 2020 • Jonathan Ho, Ajay Jain, Pieter Abbeel

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics.

Ranked #2 on Image Generation on LSUN Bedroom

Denoising Density Estimation +1

47,992

Paper
Code

Locally Masked Convolution for Autoregressive Models

1 code implementation • 22 Jun 2020 • Ajay Jain, Pieter Abbeel, Deepak Pathak

For tasks such as image completion, these models are unable to use much of the observed context.

Ranked #1 on Image Generation on MNIST

Anomaly Detection Density Estimation +2

Paper
Code

AvE: Assistance via Empowerment

1 code implementation • NeurIPS 2020 • Yuqing Du, Stas Tiomkin, Emre Kiciman, Daniel Polani, Pieter Abbeel, Anca Dragan

One difficulty in using artificial agents for human-assistive applications lies in the challenge of accurately assisting with a person's goal(s).

Paper
Code

Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

no code implementations • 8 Jul 2020 • Adam Stooke, Joshua Achiam, Pieter Abbeel

Lagrangian methods are widely used algorithms for constrained optimization problems, but their learning dynamics exhibit oscillations and overshoot which, when applied to safe reinforcement learning, leads to constraint-violating behavior during agent training.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Self-Supervised Policy Adaptation during Deployment

2 code implementations • ICLR 2021 • Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyà, Pieter Abbeel, Alexei A. Efros, Lerrel Pinto, Xiaolong Wang

A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal.

109

Paper
Code

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

1 code implementation • 9 Jul 2020 • Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel

Off-policy deep reinforcement learning (RL) has been successful in a range of challenging domains.

Efficient Exploration Ensemble Learning +3

116

Paper
Code

Contrastive Code Representation Learning

1 code implementation • EMNLP 2021 • Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel, Joseph E. Gonzalez, Ion Stoica

Recent work learns contextual representations of source code by reconstructing tokens from their context.

Ranked #1 on Method name prediction on CodeSearchNet

Clone Detection Contrastive Learning +4

165

Paper
Code

Variable Skipping for Autoregressive Range Density Estimation

1 code implementation • ICML 2020 • Eric Liang, Zongheng Yang, Ion Stoica, Pieter Abbeel, Yan Duan, Xi Chen

In this paper, we explore a technique, variable skipping, for accelerating range density estimation over deep autoregressive models.

Data Augmentation Density Estimation

Paper
Code

Efficient Empowerment Estimation for Unsupervised Stabilization

no code implementations • ICLR 2021 • Ruihan Zhao, Kevin Lu, Pieter Abbeel, Stas Tiomkin

We demonstrate our solution for sample-based unsupervised stabilization on different dynamical control systems and show the advantages of our method by comparing it to the existing VLB approaches.

Paper
Add Code

Hybrid Discriminative-Generative Training via Contrastive Learning

1 code implementation • 17 Jul 2020 • Hao Liu, Pieter Abbeel

In this paper we show that through the perspective of hybrid discriminative-generative training of energy-based models we can make a direct connection between contrastive learning and supervised learning.

Contrastive Learning Out-of-Distribution Detection

Paper
Code

Dynamics Generalization via Information Bottleneck in Deep Reinforcement Learning

no code implementations • 3 Aug 2020 • Xingyu Lu, Kimin Lee, Pieter Abbeel, Stas Tiomkin

Despite the significant progress of deep reinforcement learning (RL) in solving sequential decision making problems, RL agents often overfit to training environments and struggle to adapt to new, unseen environments.

Decision Making reinforcement-learning +1

Paper
Add Code

Robust Reinforcement Learning using Adversarial Populations

1 code implementation • 4 Aug 2020 • Eugene Vinitsky, Yuqing Du, Kanaad Parvate, Kathy Jang, Pieter Abbeel, Alexandre Bayen

Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness, failing catastrophically when the underlying system dynamics are perturbed.

Out-of-Distribution Generalization reinforcement-learning +1

Paper
Code

Visual Imitation Made Easy

no code implementations • 11 Aug 2020 • Sarah Young, Dhiraj Gandhi, Shubham Tulsiani, Abhinav Gupta, Pieter Abbeel, Lerrel Pinto

We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.

Imitation Learning

Paper
Add Code

Decoupling Representation Learning from Reinforcement Learning

3 code implementations • 14 Sep 2020 • Adam Stooke, Kimin Lee, Pieter Abbeel, Michael Laskin

In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning.

Data Augmentation reinforcement-learning +2

2,198

Paper
Code

LaND: Learning to Navigate from Disengagements

1 code implementation • 9 Oct 2020 • Gregory Kahn, Pieter Abbeel, Sergey Levine

However, we believe that these disengagements not only show where the system fails, which is useful for troubleshooting, but also provide a direct learning signal by which the robot can learn to navigate.

Autonomous Navigation Imitation Learning +3

Paper
Code

Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning

1 code implementation • NeurIPS 2020 • Younggyo Seo, Kimin Lee, Ignasi Clavera, Thanard Kurutach, Jinwoo Shin, Pieter Abbeel

Model-based reinforcement learning (RL) has shown great potential in various control tasks in terms of both sample-efficiency and final performance.

Clustering Model-based Reinforcement Learning +4

Paper
Code

Parallel Training of Deep Networks with Local Updates

1 code implementation • 7 Dec 2020 • Michael Laskin, Luke Metz, Seth Nabarro, Mark Saroufim, Badreddine Noune, Carlo Luschi, Jascha Sohl-Dickstein, Pieter Abbeel

Deep learning models trained on large data sets have been widely successful in both vision and language domains.

Paper
Code

Reset-Free Lifelong Learning with Skill-Space Planning

1 code implementation • ICLR 2021 • Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL based on planning in an abstract space of higher-order skills.

Reinforcement Learning (RL)

Paper
Code

Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation

no code implementations • 14 Dec 2020 • Albert Zhan, Ruihan Zhao, Lerrel Pinto, Pieter Abbeel, Michael Laskin

We present Contrastive Pre-training and Data Augmentation for Efficient Robotic Learning (CoDER), a method that utilizes data augmentation and unsupervised learning to achieve sample-efficient training of real-robot arm policies from sparse rewards.

Data Augmentation reinforcement-learning +2

Paper
Add Code

Compute- and Memory-Efficient Reinforcement Learning with Latent Experience Replay

no code implementations • 1 Jan 2021 • Lili Chen, Kimin Lee, Aravind Srinivas, Pieter Abbeel

In this paper, we present Latent Vector Experience Replay (LeVER), a simple modification of existing off-policy RL methods, to address these computational and memory requirements without sacrificing the performance of RL agents.

Atari Games reinforcement-learning +2

Paper
Add Code

VideoGen: Generative Modeling of Videos using VQ-VAE and Transformers

no code implementations • 1 Jan 2021 • Yunzhi Zhang, Wilson Yan, Pieter Abbeel, Aravind Srinivas

We present VideoGen: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos.

Position Video Generation

Paper
Add Code

Addressing Distribution Shift in Online Reinforcement Learning with Offline Datasets

no code implementations • 1 Jan 2021 • SeungHyun Lee, Younggyo Seo, Kimin Lee, Pieter Abbeel, Jinwoo Shin

As it turns out, fine-tuning offline RL agents is a non-trivial challenge, due to distribution shift – the agent encounters out-of-distribution samples during online interaction, which may cause bootstrapping error in Q-learning and instability during fine-tuning.

D4RL Offline RL +3

Paper
Add Code

Unsupervised Active Pre-Training for Reinforcement Learning

no code implementations • 1 Jan 2021 • Hao liu, Pieter Abbeel

On DMControl suite, APT beats all baselines in terms of asymptotic performance and data efficiency and dramatically improves performance on tasks that are extremely difficult for training from scratch.

Atari Games Contrastive Learning +3

Paper
Add Code

Discrete Predictive Representation for Long-horizon Planning

no code implementations • 1 Jan 2021 • Thanard Kurutach, Julia Peng, Yang Gao, Stuart Russell, Pieter Abbeel

Discrete representations have been key in enabling robots to plan at more abstract levels and solve temporally-extended tasks more efficiently for decades.

Object Reinforcement Learning (RL)

Paper
Add Code

Weighted Bellman Backups for Improved Signal-to-Noise in Q-Updates

no code implementations • 1 Jan 2021 • Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel

Furthermore, since our weighted Bellman backups rely on maintaining an ensemble, we investigate how weighted Bellman backups interact with other benefits previously derived from ensembles: (a) Bootstrap; (b) UCB Exploration.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Benefits of Assistance over Reward Learning

no code implementations • 1 Jan 2021 • Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell

By merging reward learning and control, assistive agents can reason about the impact of control actions on reward learning, leading to several advantages over agents based on reward learning.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.