Search Results for author: Shixiang Gu

Found 27 papers, 18 papers with code

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

1 code implementation ICLR 2021 Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu

We propose a novel model-based algorithm, Behavior-Regularized Model-ENsemble (BREMEN) that can effectively optimize a policy offline using 10-20 times fewer data than prior works.

Offline RL reinforcement-learning +1

Dynamics-Aware Unsupervised Skill Discovery

1 code implementation ICLR 2020 Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman

Conventionally, model-based reinforcement learning (MBRL) aims to learn a global model for the dynamics of the environment.

Model-based Reinforcement Learning

Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning

2 code implementations27 Apr 2020 Archit Sharma, Michael Ahn, Sergey Levine, Vikash Kumar, Karol Hausman, Shixiang Gu

Can we instead develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks?

Model Predictive Control reinforcement-learning +2

Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog

no code implementations ICLR 2020 Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard

This is a critical shortcoming for applying RL to real-world problems where collecting data is expensive, and models must be tested offline before being deployed to interact with the environment -- e. g. systems that learn from human interaction.

OpenAI Gym Open-Domain Dialog +3

A Divergence Minimization Perspective on Imitation Learning Methods

3 code implementations6 Nov 2019 Seyed Kamyar Seyed Ghasemipour, Richard Zemel, Shixiang Gu

We present $f$-MAX, an $f$-divergence generalization of AIRL [Fu et al., 2018], a state-of-the-art IRL method.

Behavioural cloning Continuous Control

Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real

no code implementations13 Aug 2019 Ofir Nachum, Michael Ahn, Hugo Ponte, Shixiang Gu, Vikash Kumar

Our method hinges on the use of hierarchical sim2real -- a simulated environment is used to learn low-level goal-reaching skills, which are then used as the action space for a high-level RL controller, also trained in simulation.

Reinforcement Learning (RL)

Dynamics-Aware Unsupervised Discovery of Skills

3 code implementations2 Jul 2019 Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, Karol Hausman

Conventionally, model-based reinforcement learning (MBRL) aims to learn a global model for the dynamics of the environment.

Model-based Reinforcement Learning

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

1 code implementation30 Jun 2019 Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard

Most deep reinforcement learning (RL) systems are not able to learn effectively from off-policy data, especially if they cannot explore online in the environment.

Open-Domain Dialog Q-Learning +2

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

2 code implementations NeurIPS 2019 Yiding Jiang, Shixiang Gu, Kevin Murphy, Chelsea Finn

We find that, using our approach, agents can learn to solve to diverse, temporally-extended tasks such as object sorting and multi-object rearrangement, including from raw pixel observations.

Instruction Following Object +2

Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives

3 code implementations ICLR 2019 George Tucker, Dieterich Lawson, Shixiang Gu, Chris J. Maddison

Burda et al. (2015) introduced a multi-sample variational bound, IWAE, that is at least as tight as the standard variational lower bound and becomes increasingly tight as the number of samples increases.

Variational Inference

Data-Efficient Hierarchical Reinforcement Learning

12 code implementations NeurIPS 2018 Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine

In this paper, we study how we can develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control.

Hierarchical Reinforcement Learning reinforcement-learning +1

The Mirage of Action-Dependent Baselines in Reinforcement Learning

1 code implementation ICML 2018 George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine

Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance.

Policy Gradient Methods reinforcement-learning +1

Temporal Difference Models: Model-Free Deep RL for Model-Based Control

no code implementations ICLR 2018 Vitchyr Pong, Shixiang Gu, Murtaza Dalal, Sergey Levine

TDMs combine the benefits of model-free and model-based RL: they leverage the rich information in state transitions to learn very efficiently, while still attaining asymptotic performance that exceeds that of direct model-based RL methods.

Continuous Control Q-Learning +1

Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning

1 code implementation ICLR 2018 Benjamin Eysenbach, Shixiang Gu, Julian Ibarz, Sergey Levine

In this work, we propose an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt.

reinforcement-learning Reinforcement Learning (RL)

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

no code implementations ICML 2017 Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck

This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity.

Reinforcement Learning (RL)

Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

2 code implementations7 Nov 2016 Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine

We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation.

Continuous Control Policy Gradient Methods +2

Categorical Reparameterization with Gumbel-Softmax

19 code implementations3 Nov 2016 Eric Jang, Shixiang Gu, Ben Poole

Categorical variables are a natural choice for representing discrete structure in the world.

General Classification

Categorical Reparametrization with Gumbel-Softmax

1 code implementation ICLR 2017 2016 Eric Jang, Shixiang Gu, Ben Poole

Categorical variables are a natural choice for representing discrete structure in the world.

Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates

no code implementations3 Oct 2016 Shixiang Gu, Ethan Holly, Timothy Lillicrap, Sergey Levine

In this paper, we demonstrate that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots.

reinforcement-learning Reinforcement Learning (RL)

Continuous Deep Q-Learning with Model-based Acceleration

8 code implementations2 Mar 2016 Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine

In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks.

Continuous Control Q-Learning +2

MuProp: Unbiased Backpropagation for Stochastic Neural Networks

2 code implementations16 Nov 2015 Shixiang Gu, Sergey Levine, Ilya Sutskever, andriy mnih

Deep neural networks are powerful parametric models that can be trained efficiently using the backpropagation algorithm.

Neural Adaptive Sequential Monte Carlo

no code implementations NeurIPS 2015 Shixiang Gu, Zoubin Ghahramani, Richard E. Turner

Experiments indicate that NASMC significantly improves inference in a non-linear state space model outperforming adaptive proposal methods including the Extended Kalman and Unscented Particle Filters.

Variational Inference

Towards Deep Neural Network Architectures Robust to Adversarial Examples

2 code implementations11 Dec 2014 Shixiang Gu, Luca Rigazio

We perform various experiments to assess the removability of adversarial examples by corrupting with additional noise and pre-processing with denoising autoencoders (DAEs).

Denoising

Cannot find the paper you are looking for? You can Submit a new open access paper.