Search Results for author: Shixiang Gu

Found 27 papers, 18 papers with code

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

1 code implementation • ICLR 2021 • Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu

We propose a novel model-based algorithm, Behavior-Regularized Model-ENsemble (BREMEN) that can effectively optimize a policy offline using 10-20 times fewer data than prior works.

Offline RL reinforcement-learning +1

Paper
Code

Dynamics-Aware Unsupervised Skill Discovery

1 code implementation • ICLR 2020 • Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman

Conventionally, model-based reinforcement learning (MBRL) aims to learn a global model for the dynamics of the environment.

Model-based Reinforcement Learning

181

Paper
Code

Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning

2 code implementations • 27 Apr 2020 • Archit Sharma, Michael Ahn, Sergey Levine, Vikash Kumar, Karol Hausman, Shixiang Gu

Can we instead develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks?

Model Predictive Control reinforcement-learning +2

181

Paper
Code

Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog

no code implementations • ICLR 2020 • Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard

This is a critical shortcoming for applying RL to real-world problems where collecting data is expensive, and models must be tested offline before being deployed to interact with the environment -- e. g. systems that learn from human interaction.

OpenAI Gym Open-Domain Dialog +3

Paper
Add Code

A Divergence Minimization Perspective on Imitation Learning Methods

3 code implementations • 6 Nov 2019 • Seyed Kamyar Seyed Ghasemipour, Richard Zemel, Shixiang Gu

We present $f$-MAX, an $f$-divergence generalization of AIRL [Fu et al., 2018], a state-of-the-art IRL method.

Behavioural cloning Continuous Control

385

Paper
Code

Why Does Hierarchy (Sometimes) Work So Well in Reinforcement Learning?

no code implementations • 23 Sep 2019 • Ofir Nachum, Haoran Tang, Xingyu Lu, Shixiang Gu, Honglak Lee, Sergey Levine

Hierarchical reinforcement learning has demonstrated significant success at solving difficult reinforcement learning (RL) tasks.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real

no code implementations • 13 Aug 2019 • Ofir Nachum, Michael Ahn, Hugo Ponte, Shixiang Gu, Vikash Kumar

Our method hinges on the use of hierarchical sim2real -- a simulated environment is used to learn low-level goal-reaching skills, which are then used as the action space for a high-level RL controller, also trained in simulation.

Reinforcement Learning (RL)

Paper
Add Code

Dynamics-Aware Unsupervised Discovery of Skills

3 code implementations • 2 Jul 2019 • Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman

Conventionally, model-based reinforcement learning (MBRL) aims to learn a global model for the dynamics of the environment.

Model-based Reinforcement Learning

181

Paper
Code

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

1 code implementation • 30 Jun 2019 • Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard

Most deep reinforcement learning (RL) systems are not able to learn effectively from off-policy data, especially if they cannot explore online in the environment.

Open-Domain Dialog Q-Learning +2

175

Paper
Code

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

2 code implementations • NeurIPS 2019 • Yiding Jiang, Shixiang Gu, Kevin Murphy, Chelsea Finn

We find that, using our approach, agents can learn to solve to diverse, temporally-extended tasks such as object sorting and multi-object rearrangement, including from raw pixel observations.

Instruction Following Object +2

124

Paper
Code

Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives

3 code implementations • ICLR 2019 • George Tucker, Dieterich Lawson, Shixiang Gu, Chris J. Maddison

Burda et al. (2015) introduced a multi-sample variational bound, IWAE, that is at least as tight as the standard variational lower bound and becomes increasingly tight as the number of samples increases.

Variational Inference

Paper
Code

Near-Optimal Representation Learning for Hierarchical Reinforcement Learning

7 code implementations • ICLR 2019 • Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine

We study the problem of representation learning in goal-conditioned hierarchical reinforcement learning.

2D Human Pose Estimation Continuous Control +4

76,571

Paper
Code

Data-Efficient Hierarchical Reinforcement Learning

12 code implementations • NeurIPS 2018 • Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine

In this paper, we study how we can develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control.

Hierarchical Reinforcement Learning reinforcement-learning +1

76,571

Paper
Code

The Mirage of Action-Dependent Baselines in Reinforcement Learning

1 code implementation • ICML 2018 • George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine

Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance.

Policy Gradient Methods reinforcement-learning +1

Paper
Code

Temporal Difference Models: Model-Free Deep RL for Model-Based Control

no code implementations • ICLR 2018 • Vitchyr Pong, Shixiang Gu, Murtaza Dalal, Sergey Levine

TDMs combine the benefits of model-free and model-based RL: they leverage the rich information in state transitions to learn very efficiently, while still attaining asymptotic performance that exceeds that of direct model-based RL methods.

Continuous Control Q-Learning +1

Paper
Add Code

TRL: Discriminative Hints for Scalable Reverse Curriculum Learning

no code implementations • ICLR 2018 • Chen Wang, Xiangyu Chen, Zelin Ye, Jialu Wang, Ziruo Cai, Shixiang Gu, Cewu Lu

However, tasks with sparse rewards remain challenging when the state space is large.

Robot Manipulation

Paper
Add Code

Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning

1 code implementation • ICLR 2018 • Benjamin Eysenbach, Shixiang Gu, Julian Ibarz, Sergey Levine

In this work, we propose an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

no code implementations • NeurIPS 2017 • Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Bernhard Schölkopf, Sergey Levine

Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques.

Continuous Control OpenAI Gym +2

Paper
Add Code

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

no code implementations • ICML 2017 • Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck

This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity.

Reinforcement Learning (RL)

Paper
Add Code

Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

2 code implementations • 7 Nov 2016 • Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine

We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation.

Continuous Control Policy Gradient Methods +2

160

Paper
Code

Categorical Reparameterization with Gumbel-Softmax

19 code implementations • 3 Nov 2016 • Eric Jang, Shixiang Gu, Ben Poole

Categorical variables are a natural choice for representing discrete structure in the world.

General Classification

76,576

Paper
Code

Categorical Reparametrization with Gumbel-Softmax

1 code implementation • ICLR 2017 2016 • Eric Jang, Shixiang Gu, Ben Poole

Categorical variables are a natural choice for representing discrete structure in the world.

424

Paper
Code

Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates

no code implementations • 3 Oct 2016 • Shixiang Gu, Ethan Holly, Timothy Lillicrap, Sergey Levine

In this paper, we demonstrate that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Continuous Deep Q-Learning with Model-based Acceleration

8 code implementations • 2 Mar 2016 • Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine

In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks.

Continuous Control Q-Learning +2

Paper
Code

MuProp: Unbiased Backpropagation for Stochastic Neural Networks

2 code implementations • 16 Nov 2015 • Shixiang Gu, Sergey Levine, Ilya Sutskever, andriy mnih

Deep neural networks are powerful parametric models that can be trained efficiently using the backpropagation algorithm.

76,576

Paper
Code

Neural Adaptive Sequential Monte Carlo

no code implementations • NeurIPS 2015 • Shixiang Gu, Zoubin Ghahramani, Richard E. Turner

Experiments indicate that NASMC significantly improves inference in a non-linear state space model outperforming adaptive proposal methods including the Extended Kalman and Unscented Particle Filters.

Variational Inference

Paper
Add Code

Towards Deep Neural Network Architectures Robust to Adversarial Examples

2 code implementations • 11 Dec 2014 • Shixiang Gu, Luca Rigazio

We perform various experiments to assess the removability of adversarial examples by corrupting with additional noise and pre-processing with denoising autoencoders (DAEs).

Denoising

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.