Search Results for author: Shixiang Shane Gu

Found 7 papers, 4 papers with code

A Minimalist Approach to Offline Reinforcement Learning

2 code implementations12 Jun 2021 Scott Fujimoto, Shixiang Shane Gu

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.

Offline RL

Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning

no code implementations2 Jun 2021 Jongwook Choi, Archit Sharma, Honglak Lee, Sergey Levine, Shixiang Shane Gu

Learning to reach goal states and learning diverse skills through mutual information (MI) maximization have been proposed as principled frameworks for self-supervised reinforcement learning, allowing agents to acquire broadly applicable multitask policies with minimal reward engineering.

Representation Learning

Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning

1 code implementation31 Mar 2021 Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima, Yutaka Matsuo, Shixiang Shane Gu

These results show which implementation details are co-adapted and co-evolved with algorithms, and which are transferable across algorithms: as examples, we identified that tanh Gaussian policy and network sizes are highly adapted to algorithmic types, while layer normalization and ELU are critical for MPO's performances but also transfer to noticeable gains in SAC.

Human-centric Dialog Training via Offline Reinforcement Learning

1 code implementation EMNLP 2020 Natasha Jaques, Judy Hanwen Shen, Asma Ghandeharioun, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Shane Gu, Rosalind Picard

We start by hosting models online, and gather human feedback from real-time, open-ended conversations, which we then use to train and improve the models using offline reinforcement learning (RL).

Language Modelling Offline RL

EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL

no code implementations21 Jul 2020 Seyed Kamyar Seyed Ghasemipour, Dale Schuurmans, Shixiang Shane Gu

In this work, we closely investigate an important simplification of BCQ -- a prior approach for offline RL -- which removes a heuristic design choice and naturally restricts extracted policies to remain exactly within the support of a given behavior policy.

Decision Making Offline RL +1

Cannot find the paper you are looking for? You can Submit a new open access paper.