no code implementations • 12 Apr 2022 • Aviral Kumar, Joey Hong, Anikait Singh, Sergey Levine
To answer this question, we characterize the properties of environments that allow offline RL methods to perform better than BC methods, even when only provided with expert data.
3 code implementations • 17 Feb 2022 • Brandon Trabucco, Xinyang Geng, Aviral Kumar, Sergey Levine
To address this, we present Design-Bench, a benchmark for offline MBO with a unified evaluation protocol and reference implementations of recent methods.
no code implementations • 3 Feb 2022 • Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Chelsea Finn, Sergey Levine
One natural solution is to learn a reward function from the labeled data and use it to label the unlabeled data.
no code implementations • ICLR 2022 • Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine
In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations.
1 code implementation • ICLR 2022 • Aviral Kumar, Amir Yazdanbakhsh, Milad Hashemi, Kevin Swersky, Sergey Levine
An alternative paradigm is to use a "data-driven", offline approach that utilizes logged simulation data, to architect hardware accelerators, without needing any form of simulations.
no code implementations • ICLR 2022 • Aviral Kumar, Joey Hong, Anikait Singh, Sergey Levine
In this paper, our goal is to characterize environments and dataset compositions where offline RL leads to better performance than BC.
no code implementations • 29 Sep 2021 • Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Chelsea Finn, Sergey Levine, Karol Hausman
However, these benefits come at a cost -- for data to be shared between tasks, each transition must be annotated with reward labels corresponding to other tasks.
no code implementations • 22 Sep 2021 • Aviral Kumar, Anikait Singh, Stephen Tian, Chelsea Finn, Sergey Levine
To this end, we devise a set of metrics and conditions that can be tracked over the course of offline training, and can inform the practitioner about how the algorithm and model architecture should be adjusted to improve final performance.
no code implementations • NeurIPS 2021 • Tianhe Yu, Aviral Kumar, Yevgen Chebotar, Karol Hausman, Sergey Levine, Chelsea Finn
We argue that a natural use case of offline RL is in settings where we can pool large amounts of data collected in various scenarios for solving different tasks, and utilize all of this data to learn behaviors for all the tasks more effectively rather than training each one in isolation.
2 code implementations • 14 Jul 2021 • Brandon Trabucco, Aviral Kumar, Xinyang Geng, Sergey Levine
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
no code implementations • NeurIPS 2021 • Dibya Ghosh, Jad Rahme, Aviral Kumar, Amy Zhang, Ryan P. Adams, Sergey Levine
Generalization is a central challenge for the deployment of reinforcement learning (RL) systems in the real world.
3 code implementations • ICLR 2021 • Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine
Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making.
2 code implementations • NeurIPS 2021 • Tianhe Yu, Aviral Kumar, Rafael Rafailov, Aravind Rajeswaran, Sergey Levine, Chelsea Finn
We overcome this limitation by developing a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-action tuples generated via rollouts under the learned model.
no code implementations • NeurIPS 2020 • Saurabh Kumar, Aviral Kumar, Sergey Levine, Chelsea Finn
While reinforcement learning algorithms can learn effective policies for complex tasks, these policies are often brittle to even minor task variations, especially when variations are not explicitly provided during training.
no code implementations • ICLR 2021 • Homanga Bharadhwaj, Aviral Kumar, Nicholas Rhinehart, Sergey Levine, Florian Shkurti, Animesh Garg
Safe exploration presents a major challenge in reinforcement learning (RL): when active data collection requires deploying partially trained policies, we must ensure that these policies avoid catastrophically unsafe regions, while still enabling trial and error learning.
1 code implementation • ICLR 2021 • Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, Sergey Levine
We identify an implicit under-parameterization phenomenon in value-based deep RL methods that use bootstrapping: when value functions, approximated using deep neural networks, are trained with gradient descent using iterated regression onto target values generated by previous instances of the value network, more gradient updates decrease the expressivity of the current value network.
1 code implementation • 27 Oct 2020 • Avi Singh, Albert Yu, Jonathan Yang, Jesse Zhang, Aviral Kumar, Sergey Levine
Reinforcement learning has been applied to a wide variety of robotics problems, but most of such applications involve collecting data from scratch for each new task.
no code implementations • ICLR 2021 • Anurag Ajay, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum
Reinforcement learning (RL) has achieved impressive performance in a variety of online settings in which an agent's ability to query the environment for transitions and rewards is effectively unlimited.
10 code implementations • NeurIPS 2020 • Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
3 code implementations • 4 May 2020 • Sergey Levine, Aviral Kumar, George Tucker, Justin Fu
In this tutorial article, we aim to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcement learning algorithms that utilize previously collected data, without additional online data collection.
4 code implementations • 15 Apr 2020 • Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, Sergey Levine
In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
3 code implementations • NeurIPS 2020 • Aviral Kumar, Abhishek Gupta, Sergey Levine
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from this corrective feedback, and training on the experience collected by the algorithm is not sufficient to correct errors in the Q-function.
Ranked #3 on
Meta-Learning
on MT50
no code implementations • NeurIPS 2020 • Aviral Kumar, Sergey Levine
MINs can scale to high-dimensional input spaces and leverage offline logged data for both contextual and non-contextual optimization problems.
1 code implementation • 31 Dec 2019 • Aviral Kumar, Xue Bin Peng, Sergey Levine
By then conditioning the policy on the numerical value of the reward, we can obtain a policy that generalizes to larger returns.
5 code implementations • 1 Oct 2019 • Xue Bin Peng, Aviral Kumar, Grace Zhang, Sergey Levine
In this paper, we aim to develop a simple and scalable reinforcement learning algorithm that uses standard supervised learning methods as subroutines.
Ranked #1 on
OpenAI Gym
on Humanoid-v2
no code implementations • 25 Sep 2019 • Xue Bin Peng, Aviral Kumar, Grace Zhang, Sergey Levine
In this paper, we aim to develop a simple and scalable reinforcement learning algorithm that uses standard supervised learning methods as subroutines.
2 code implementations • NeurIPS 2019 • Aviral Kumar, Justin Fu, George Tucker, Sergey Levine
Bootstrapping error is due to bootstrapping from actions that lie outside of the training data distribution, and it accumulates via the Bellman backup operator.
1 code implementation • NeurIPS 2019 • Jenny Liu, Aviral Kumar, Jimmy Ba, Jamie Kiros, Kevin Swersky
We introduce graph normalizing flows: a new, reversible graph neural network model for prediction and generation.
no code implementations • 3 Mar 2019 • Aviral Kumar, Sunita Sarawagi
We study the calibration of several state of the art neural machine translation(NMT) systems built on attention-based encoder-decoder models.
1 code implementation • 26 Feb 2019 • Justin Fu, Aviral Kumar, Matthew Soh, Sergey Levine
Q-learning methods represent a commonly used class of algorithms in reinforcement learning: they are generally efficient and simple, and can be combined readily with function approximators for deep reinforcement learning (RL).
1 code implementation • ICML 2018 • Aviral Kumar, Sunita Sarawagi, Ujjwal Jain
Modern neural networks have recently been found to be poorly calibrated, primarily in the direction of over-confidence.