Search Results for author: George Tucker

Found 42 papers, 21 papers with code

Model Selection in Batch Policy Optimization

no code implementations23 Dec 2021 Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai

We formalize the problem in the contextual bandit setting with linear model classes by identifying three sources of error that any model selection algorithm should optimally trade-off in order to be competitive: (1) approximation error, (2) statistical complexity, and (3) coverage.

Model Selection

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization

no code implementations9 Dec 2021 Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine

In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations.

Atari Games Offline RL

Coupled Gradient Estimators for Discrete Latent Variables

no code implementations NeurIPS 2021 Zhe Dong, andriy mnih, George Tucker

Training models with discrete latent variables is challenging due to the high variance of unbiased gradient estimators.

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

no code implementations ICLR 2021 Michael R. Zhang, Tom Le Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, Ziyu Wang, Mohammad Norouzi

This modeling choice assumes that different dimensions of the next state and reward are conditionally independent given the current state and action and may be driven by the fact that fully observable physics-based simulation environments entail deterministic transition dynamics.

Continuous Control Data Augmentation

Benchmarks for Deep Off-Policy Evaluation

3 code implementations ICLR 2021 Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine

Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making.

Continuous Control Decision Making +1

Offline Policy Selection under Uncertainty

1 code implementation12 Dec 2020 Mengjiao Yang, Bo Dai, Ofir Nachum, George Tucker, Dale Schuurmans

More importantly, we show how the belief distribution estimated by BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric, and we empirically demonstrate that this selection procedure significantly outperforms existing approaches, such as ranking policies according to mean or high-confidence lower bound value estimates.

RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning

2 code implementations24 Jun 2020 Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas

We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.

Atari Games DQN Replay Dataset +1

DisARM: An Antithetic Gradient Estimator for Binary Latent Variables

no code implementations NeurIPS 2020 Zhe Dong, andriy mnih, George Tucker

Applying antithetic sampling over the augmenting variables yields a relatively low-variance and unbiased estimator applicable to any model with binary latent variables.

Conservative Q-Learning for Offline Reinforcement Learning

10 code implementations NeurIPS 2020 Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine

We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.

Continuous Control DQN Replay Dataset +1

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

2 code implementations4 May 2020 Sergey Levine, Aviral Kumar, George Tucker, Justin Fu

In this tutorial article, we aim to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcement learning algorithms that utilize previously collected data, without additional online data collection.

Decision Making

Model Based Reinforcement Learning for Atari

no code implementations ICLR 2020 Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Błażej Osiński, Roy H. Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski

We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting.

Atari Games Model-based Reinforcement Learning +1

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

3 code implementations15 Apr 2020 Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine

In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.

Offline RL

Meta-Learning without Memorization

1 code implementation ICLR 2020 Mingzhang Yin, George Tucker, Mingyuan Zhou, Sergey Levine, Chelsea Finn

If this is not done, the meta-learner can ignore the task training data and learn a single model that performs all of the meta-training tasks zero-shot, but does not adapt effectively to new image classes.

Few-Shot Image Classification Meta-Learning

Behavior Regularized Offline Reinforcement Learning

1 code implementation26 Nov 2019 Yifan Wu, George Tucker, Ofir Nachum

In reinforcement learning (RL) research, it is common to assume access to direct online interactions with the environment.

Continuous Control Offline RL

Don't Blame the ELBO! A Linear VAE Perspective on Posterior Collapse

no code implementations NeurIPS 2019 James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi

Posterior collapse in Variational Autoencoders (VAEs) arises when the variational posterior distribution closely matches the prior for a subset of latent variables.

Variational Inference

Energy-Inspired Models: Learning with Sampler-Induced Distributions

1 code implementation NeurIPS 2019 Dieterich Lawson, George Tucker, Bo Dai, Rajesh Ranganath

Motivated by this, we consider the sampler-induced distribution as the model of interest and maximize the likelihood of this model.

Variational Inference

Reinforcement Learning Driven Heuristic Optimization

no code implementations16 Jun 2019 Qingpeng Cai, Will Hang, Azalia Mirhoseini, George Tucker, Jingtao Wang, Wei Wei

In this paper, we introduce a novel framework to generate better initial solutions for heuristic algorithms using reinforcement learning (RL), named RLHO.

Combinatorial Optimization

Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction

2 code implementations NeurIPS 2019 Aviral Kumar, Justin Fu, George Tucker, Sergey Levine

Bootstrapping error is due to bootstrapping from actions that lie outside of the training data distribution, and it accumulates via the Bellman backup operator.

Continuous Control Q-Learning

On Variational Bounds of Mutual Information

3 code implementations16 May 2019 Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker

Estimating and optimizing Mutual Information (MI) is core to many problems in machine learning; however, bounding MI in high dimensions is challenging.

Representation Learning

Guided Evolutionary Strategies: Escaping the curse of dimensionality in random search

no code implementations ICLR 2019 Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein

This arises when an approximate gradient is easier to compute than the full gradient (e. g. in meta-learning or unrolled optimization), or when a true gradient is intractable and is replaced with a surrogate (e. g. in certain reinforcement learning applications or training networks with discrete variables).

Meta-Learning

Understanding Posterior Collapse in Generative Latent Variable Models

no code implementations ICLR Workshop DeepGenStruct 2019 James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi

Posterior collapse in Variational Autoencoders (VAEs) arises when the variational distribution closely matches the uninformative prior for a subset of latent variables.

Variational Inference

Revisiting Auxiliary Latent Variables in Generative Models

no code implementations ICLR Workshop DeepGenStruct 2019 Dieterich Lawson, George Tucker, Bo Dai, Rajesh Ranganath

The success of enriching the variational family with auxiliary latent variables motivates applying the same techniques to the generative model.

Model-Based Reinforcement Learning for Atari

4 code implementations1 Mar 2019 Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski

We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting.

Atari Games Atari Games 100k +2

Learning to Walk via Deep Reinforcement Learning

no code implementations26 Dec 2018 Tuomas Haarnoja, Sehoon Ha, Aurick Zhou, Jie Tan, George Tucker, Sergey Levine

In this paper, we propose a sample-efficient deep RL algorithm based on maximum entropy RL that requires minimal per-task tuning and only a modest number of trials to learn neural network policies.

Legged Robots

The Laplacian in RL: Learning Representations with Efficient Approximations

no code implementations ICLR 2019 Yifan Wu, George Tucker, Ofir Nachum

In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context.

Representation Learning

Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives

3 code implementations ICLR 2019 George Tucker, Dieterich Lawson, Shixiang Gu, Chris J. Maddison

Burda et al. (2015) introduced a multi-sample variational bound, IWAE, that is at least as tight as the standard variational lower bound and becomes increasingly tight as the number of samples increases.

Variational Inference

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

2 code implementations NeurIPS 2018 Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee

Integrating model-free and model-based approaches in reinforcement learning has the potential to achieve the high performance of model-free algorithms with low sample complexity.

Continuous Control

Guided evolutionary strategies: Augmenting random search with surrogate gradients

1 code implementation ICLR 2019 Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein

We propose Guided Evolutionary Strategies, a method for optimally using surrogate gradient directions along with random search.

Meta-Learning

Smoothed Action Value Functions for Learning Gaussian Policies

no code implementations ICML 2018 Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans

State-action value functions (i. e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning.

Continuous Control Q-Learning

The Mirage of Action-Dependent Baselines in Reinforcement Learning

1 code implementation ICML 2018 George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine

Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance.

Policy Gradient Methods

Combining Model-based and Model-free RL via Multi-step Control Variates

no code implementations ICLR 2018 Tong Che, Yuchen Lu, George Tucker, Surya Bhupatiraju, Shane Gu, Sergey Levine, Yoshua Bengio

Model-free deep reinforcement learning algorithms are able to successfully solve a wide range of continuous control tasks, but typically require many on-policy samples to achieve good performance.

Continuous Control OpenAI Gym

An online sequence-to-sequence model for noisy speech recognition

no code implementations16 Jun 2017 Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly

This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition.

Noisy Speech Recognition

Filtering Variational Objectives

3 code implementations NeurIPS 2017 Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, andriy mnih, Arnaud Doucet, Yee Whye Teh

When used as a surrogate objective for maximum likelihood estimation in latent variable models, the evidence lower bound (ELBO) produces state-of-the-art results.

Learning Hard Alignments with Variational Inference

no code implementations16 May 2017 Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly

There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition.

Image Captioning Object Recognition +3

Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting

no code implementations5 May 2017 Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Geng-Shen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni

Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields $67. 6\%$ relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure.

Small-Footprint Keyword Spotting

Particle Value Functions

no code implementations16 Mar 2017 Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Arnaud Doucet, andriy mnih, Yee Whye Teh

The policy gradients of the expected return objective can react slowly to rare rewards.

Compacting Neural Network Classifiers via Dropout Training

no code implementations18 Nov 2016 Yotaro Kubo, George Tucker, Simon Wiesler

We introduce dropout compaction, a novel method for training feed-forward neural networks which realizes the performance gains of training a large model with dropout regularization, yet extracts a compact neural network for run-time efficiency.

Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.