no code implementations • 25 Oct 2022 • Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih
We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model.
no code implementations • 19 Oct 2022 • Hao liu, Tom Zahavy, Volodymyr Mnih, Satinder Singh
In this work, we aim to bring the best of both worlds and propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets.
no code implementations • NeurIPS 2021 • Steven Hansen, Guillaume Desjardins, Kate Baumli, David Warde-Farley, Nicolas Heess, Simon Osindero, Volodymyr Mnih
An agent might be said, informally, to have mastery of its environment when it has maximised the effective number of states it can reliably reach.
no code implementations • 28 Oct 2021 • Ishan Durugkar, Steven Hansen, Stephen Spencer, Volodymyr Mnih
This paper deals with the problem of learning a skill-conditioned policy that acts meaningfully in the absence of a reward signal.
no code implementations • ICML Workshop URL 2021 • Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh
We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.
no code implementations • 14 Dec 2020 • Kate Baumli, David Warde-Farley, Steven Hansen, Volodymyr Mnih
In the absence of external rewards, agents can still learn useful behaviors by identifying and mastering a set of diverse skills within their environment.
no code implementations • 22 Jan 2020 • Tom Van de Wiele, David Warde-Farley, andriy mnih, Volodymyr Mnih
Applying Q-learning to high-dimensional or continuous action spaces can be difficult due to the required maximization over the set of possible actions.
6 code implementations • NeurIPS 2019 • Tejas Kulkarni, Ankush Gupta, Catalin Ionescu, Sebastian Borgeaud, Malcolm Reynolds, Andrew Zisserman, Volodymyr Mnih
In this work we aim to learn object representations that are useful for control and reinforcement learning (RL).
no code implementations • ICLR 2020 • Steven Hansen, Will Dabney, Andre Barreto, Tom Van de Wiele, David Warde-Farley, Volodymyr Mnih
It has been established that diverse behaviors spanning the controllable subspace of an Markov decision process can be trained by rewarding a policy for being distinguishable from other policies \citep{gregor2016variational, eysenbach2018diversity, warde2018unsupervised}.
no code implementations • ICLR 2019 • David Warde-Farley, Tom Van de Wiele, tejas kulkarni, Catalin Ionescu, Steven Hansen, Volodymyr Mnih
Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research.
1 code implementation • ICML 2018 • Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, Jost Tobias Springenberg
We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm in the context of Reinforcement Learning (RL).
1 code implementation • ICML 2018 • Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih
In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps.
3 code implementations • 16 Nov 2016 • Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z. Leibo, David Silver, Koray Kavukcuoglu
We also introduce a novel mechanism for focusing this representation upon extrinsic rewards, so that learning can rapidly adapt to the most relevant aspects of the actual task.
no code implementations • 5 Nov 2016 • Brendan O'Donoghue, Remi Munos, Koray Kavukcuoglu, Volodymyr Mnih
Policy gradient is an efficient technique for improving a policy in a reinforcement learning setting.
8 code implementations • 3 Nov 2016 • Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, Nando de Freitas
This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems.
4 code implementations • NeurIPS 2016 • Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu
Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs.
no code implementations • NeurIPS 2016 • Alexander, Vezhnevets, Volodymyr Mnih, John Agapiou, Simon Osindero, Alex Graves, Oriol Vinyals, Koray Kavukcuoglu
We present a novel deep recurrent neural network architecture that learns to build implicit plans in an end-to-end manner by purely interacting with an environment in reinforcement learning setting.
no code implementations • NeurIPS 2016 • Hado van Hasselt, Arthur Guez, Matteo Hessel, Volodymyr Mnih, David Silver
Most learning algorithms are not invariant to the scale of the function that is being approximated.
Ranked #12 on
Atari Games
on Atari 2600 Centipede
71 code implementations • 4 Feb 2016 • Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu
We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers.
Ranked #9 on
Atari Games
on Atari 2600 Star Gunner
1 code implementation • 19 Nov 2015 • Andrei A. Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell
Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance.
3 code implementations • 15 Jul 2015 • Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, Shane Legg, Volodymyr Mnih, Koray Kavukcuoglu, David Silver
We present the first massively distributed architecture for deep reinforcement learning.
Ranked #17 on
Atari Games
on Atari 2600 Private Eye
7 code implementations • 25 Feb 2015 • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg1 & Demis Hassabis
We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters.
5 code implementations • 24 Dec 2014 • Jimmy Ba, Volodymyr Mnih, Koray Kavukcuoglu
We present an attention-based model for recognizing multiple objects in images.
18 code implementations • NeurIPS 2014 • Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu
Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels.
109 code implementations • 19 Dec 2013 • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning.
Ranked #1 on
Atari Games
on Atari 2600 Pong
no code implementations • NeurIPS 2010 • Marc'Aurelio Ranzato, Volodymyr Mnih, Geoffrey E. Hinton
Probabilistic models of natural images are usually evaluated by measuring performance on rather indirect tasks, such as denoising and inpainting.