no code implementations • 21 Dec 2022 • Yiren Lu, Justin Fu, George Tucker, Xinlei Pan, Eli Bronstein, Rebecca Roelofs, Benjamin Sapp, Brandyn White, Aleksandra Faust, Shimon Whiteson, Dragomir Anguelov, Sergey Levine
To our knowledge, this is the first application of a combined imitation and reinforcement learning approach in autonomous driving that utilizes large amounts of real-world human driving data.
no code implementations • 28 Nov 2022 • Aviral Kumar, Rishabh Agarwal, Xinyang Geng, George Tucker, Sergey Levine
The potential of offline reinforcement learning (RL) is that high-capacity models trained on large, heterogeneous datasets can lead to agents that generalize broadly, analogously to similar advances in vision and NLP.
no code implementations • 3 Nov 2022 • Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill
We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal oracle inequalities up to logarithmic factors.
no code implementations • 23 Dec 2021 • Jonathan N. Lee, George Tucker, Ofir Nachum, Bo Dai
We formalize the problem in the contextual bandit setting with linear model classes by identifying three sources of error that any model selection algorithm should optimally trade-off in order to be competitive: (1) approximation error, (2) statistical complexity, and (3) coverage.
no code implementations • ICLR 2022 • Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine
In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations.
no code implementations • NeurIPS 2021 • Zhe Dong, andriy mnih, George Tucker
Training models with discrete latent variables is challenging due to the high variance of unbiased gradient estimators.
no code implementations • ICLR 2021 • Michael R. Zhang, Tom Le Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, Ziyu Wang, Mohammad Norouzi
This modeling choice assumes that different dimensions of the next state and reward are conditionally independent given the current state and action and may be driven by the fact that fully observable physics-based simulation environments entail deterministic transition dynamics.
3 code implementations • ICLR 2021 • Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine
Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making.
1 code implementation • 12 Dec 2020 • Mengjiao Yang, Bo Dai, Ofir Nachum, George Tucker, Dale Schuurmans
More importantly, we show how the belief distribution estimated by BayesDICE may be used to rank policies with respect to any arbitrary downstream policy selection metric, and we empirically demonstrate that this selection procedure significantly outperforms existing approaches, such as ranking policies according to mean or high-confidence lower bound value estimates.
2 code implementations • 24 Jun 2020 • Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas
We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.
no code implementations • NeurIPS 2020 • Zhe Dong, andriy mnih, George Tucker
Applying antithetic sampling over the augmenting variables yields a relatively low-variance and unbiased estimator applicable to any model with binary latent variables.
15 code implementations • NeurIPS 2020 • Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
3 code implementations • 4 May 2020 • Sergey Levine, Aviral Kumar, George Tucker, Justin Fu
In this tutorial article, we aim to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcement learning algorithms that utilize previously collected data, without additional online data collection.
no code implementations • ICLR 2020 • Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Błażej Osiński, Roy H. Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski
We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting.
7 code implementations • 15 Apr 2020 • Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine
In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
1 code implementation • ICLR 2020 • Mingzhang Yin, George Tucker, Mingyuan Zhou, Sergey Levine, Chelsea Finn
If this is not done, the meta-learner can ignore the task training data and learn a single model that performs all of the meta-training tasks zero-shot, but does not adapt effectively to new image classes.
1 code implementation • 26 Nov 2019 • Yifan Wu, George Tucker, Ofir Nachum
In reinforcement learning (RL) research, it is common to assume access to direct online interactions with the environment.
no code implementations • NeurIPS 2019 • James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi
Posterior collapse in Variational Autoencoders (VAEs) arises when the variational posterior distribution closely matches the prior for a subset of latent variables.
1 code implementation • NeurIPS 2019 • Dieterich Lawson, George Tucker, Bo Dai, Rajesh Ranganath
Motivated by this, we consider the sampler-induced distribution as the model of interest and maximize the likelihood of this model.
no code implementations • 16 Jun 2019 • Qingpeng Cai, Will Hang, Azalia Mirhoseini, George Tucker, Jingtao Wang, Wei Wei
In this paper, we introduce a novel framework to generate better initial solutions for heuristic algorithms using reinforcement learning (RL), named RLHO.
3 code implementations • NeurIPS 2019 • Aviral Kumar, Justin Fu, George Tucker, Sergey Levine
Bootstrapping error is due to bootstrapping from actions that lie outside of the training data distribution, and it accumulates via the Bellman backup operator.
3 code implementations • 16 May 2019 • Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker
Estimating and optimizing Mutual Information (MI) is core to many problems in machine learning; however, bounding MI in high dimensions is challenging.
no code implementations • ICLR 2019 • Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein
This arises when an approximate gradient is easier to compute than the full gradient (e. g. in meta-learning or unrolled optimization), or when a true gradient is intractable and is replaced with a surrogate (e. g. in certain reinforcement learning applications or training networks with discrete variables).
no code implementations • ICLR Workshop DeepGenStruct 2019 • James Lucas, George Tucker, Roger Grosse, Mohammad Norouzi
Posterior collapse in Variational Autoencoders (VAEs) arises when the variational distribution closely matches the uninformative prior for a subset of latent variables.
no code implementations • ICLR Workshop DeepGenStruct 2019 • Dieterich Lawson, George Tucker, Bo Dai, Rajesh Ranganath
The success of enriching the variational family with auxiliary latent variables motivates applying the same techniques to the generative model.
2 code implementations • 1 Mar 2019 • Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H. Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski
We describe Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models and present a comparison of several model architectures, including a novel architecture that yields the best results in our setting.
Ranked #8 on
Atari Games 100k
on Atari 100k
no code implementations • 26 Dec 2018 • Tuomas Haarnoja, Sehoon Ha, Aurick Zhou, Jie Tan, George Tucker, Sergey Levine
In this paper, we propose a sample-efficient deep RL algorithm based on maximum entropy RL that requires minimal per-task tuning and only a modest number of trials to learn neural network policies.
47 code implementations • 13 Dec 2018 • Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, Sergey Levine
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
no code implementations • ICLR 2019 • Yifan Wu, George Tucker, Ofir Nachum
In this paper, we present a fully general and scalable method for approximating the eigenvectors of the Laplacian in a model-free RL context.
3 code implementations • ICLR 2019 • George Tucker, Dieterich Lawson, Shixiang Gu, Chris J. Maddison
Burda et al. (2015) introduced a multi-sample variational bound, IWAE, that is at least as tight as the standard variational lower bound and becomes increasingly tight as the number of samples increases.
2 code implementations • NeurIPS 2018 • Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee
Integrating model-free and model-based approaches in reinforcement learning has the potential to achieve the high performance of model-free algorithms with low sample complexity.
1 code implementation • ICLR 2019 • Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein
We propose Guided Evolutionary Strategies, a method for optimally using surrogate gradient directions along with random search.
no code implementations • ICML 2018 • Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans
State-action value functions (i. e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning.
1 code implementation • ICML 2018 • George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine
Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance.
4 code implementations • ICLR 2018 • Carlos Riquelme, George Tucker, Jasper Snoek
At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical.
Ranked #1 on
Multi-Armed Bandits
on Mushroom
no code implementations • ICLR 2018 • Ofir Nachum, Mohammad Norouzi, George Tucker, Dale Schuurmans
We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value used in SARSA.
no code implementations • ICLR 2018 • Tong Che, Yuchen Lu, George Tucker, Surya Bhupatiraju, Shane Gu, Sergey Levine, Yoshua Bengio
Model-free deep reinforcement learning algorithms are able to successfully solve a wide range of continuous control tasks, but typically require many on-policy samples to achieve good performance.
no code implementations • 16 Jun 2017 • Chung-Cheng Chiu, Dieterich Lawson, Yuping Luo, George Tucker, Kevin Swersky, Ilya Sutskever, Navdeep Jaitly
This is because the models require that the entirety of the input sequence be available at the beginning of inference, an assumption that is not valid for instantaneous speech recognition.
3 code implementations • NeurIPS 2017 • Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, andriy mnih, Arnaud Doucet, Yee Whye Teh
When used as a surrogate objective for maximum likelihood estimation in latent variable models, the evidence lower bound (ELBO) produces state-of-the-art results.
no code implementations • 16 May 2017 • Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly
There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition.
no code implementations • 5 May 2017 • Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Geng-Shen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni
Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields $67. 6\%$ relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure.
3 code implementations • NeurIPS 2017 • George Tucker, andriy mnih, Chris J. Maddison, Dieterich Lawson, Jascha Sohl-Dickstein
Learning in models with discrete latent variables is challenging due to high variance gradient estimators.
no code implementations • 16 Mar 2017 • Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Arnaud Doucet, andriy mnih, Yee Whye Teh
The policy gradients of the expected return objective can react slowly to rare rewards.
2 code implementations • 23 Jan 2017 • Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, Geoffrey Hinton
We systematically explore regularizing neural networks by penalizing low entropy output distributions.
no code implementations • 18 Nov 2016 • Yotaro Kubo, George Tucker, Simon Wiesler
We introduce dropout compaction, a novel method for training feed-forward neural networks which realizes the performance gains of training a large model with dropout regularization, yet extracts a compact neural network for run-time efficiency.