1 code implementation • ICML 2020 • Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi
The DQN replay dataset can serve as an offline RL benchmark and is open-sourced.
1 code implementation • 1 Mar 2022 • Charline Le Lan, Stephen Tu, Adam Oberman, Rishabh Agarwal, Marc G. Bellemare
We complement our theoretical results with an empirical survey of classic representation learning methods from the literature and results on the Arcade Learning Environment, and find that the generalization behaviour of learned representations is well-explained by their effective dimension.
no code implementations • ICLR 2022 • Aviral Kumar, Rishabh Agarwal, Tengyu Ma, Aaron Courville, George Tucker, Sergey Levine
In this paper, we discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL setting, leading to poor generalization and degenerate feature representations.
1 code implementation • NeurIPS 2021 • Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare
Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs.
1 code implementation • ICML Workshop URL 2021 • Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, Pierre-Luc Bacon
The shortcomings of maximum likelihood estimation in the context of model-based reinforcement learning have been highlighted by an increasing number of papers.
1 code implementation • ICLR 2021 • Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare
Specifically, we introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states.
1 code implementation • NeurIPS 2020 • Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh S. Merel, Daniel J. Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matthew Hoffman, Nicolas Heess, Nando de Freitas
We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.
1 code implementation • ICLR 2021 • Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, Sergey Levine
We identify an implicit under-parameterization phenomenon in value-based deep RL methods that use bootstrapping: when value functions, approximated using deep neural networks, are trained with gradient descent using iterated regression onto target values generated by previous instances of the value network, more gradient updates decrease the expressivity of the current value network.
1 code implementation • SEMEVAL 2020 • Vipul Singhal, Sahil Dhull, Rishabh Agarwal, Ashutosh Modi
This paper describes the system proposed for addressing the research problem posed in Task 10 of SemEval-2020: Emphasis Selection For Written Text in Visual Media.
2 code implementations • ICML 2020 • William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but there remain significant gaps in our understanding.
2 code implementations • 24 Jun 2020 • Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Tom Le Paine, Sergio Gomez Colmenarejo, Konrad Zolna, Rishabh Agarwal, Josh Merel, Daniel Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matt Hoffman, Ofir Nachum, George Tucker, Nicolas Heess, Nando de Freitas
We hope that our suite of benchmarks will increase the reproducibility of experiments and make it possible to study challenging tasks with a limited computational budget, thus making RL research both more systematic and more accessible across the community.
5 code implementations • NeurIPS 2021 • Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, Geoffrey Hinton
They perform similarly to existing state-of-the-art generalized additive models in accuracy, but are more flexible because they are based on neural nets instead of boosted trees.
no code implementations • 25 Sep 2019 • Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi
This paper advocates the use of offline (batch) reinforcement learning (RL) to help (1) isolate the contributions of exploitation vs. exploration in off-policy deep RL, (2) improve reproducibility of deep RL research, and (3) facilitate the design of simpler deep RL algorithms.
1 code implementation • 10 Jul 2019 • Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi
The DQN replay dataset can serve as an offline RL benchmark and is open-sourced.
1 code implementation • 19 Feb 2019 • Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi
The parameters of the auxiliary reward function are optimized with respect to the validation performance of a trained policy.
no code implementations • 25 Jan 2019 • Rishabh Agarwal
The current state-of-the-art Scrabble agents are not learning-based but depend on truncated Monte Carlo simulations and the quality of such agents is contingent upon the time available for running the simulations.