Search Results for author: Volodymyr Mnih

Found 24 papers, 12 papers with code

Entropic Desired Dynamics for Intrinsic Control

no code implementations NeurIPS 2021 Steven Hansen, Guillaume Desjardins, Kate Baumli, David Warde-Farley, Nicolas Heess, Simon Osindero, Volodymyr Mnih

An agent might be said, informally, to have mastery of its environment when it has maximised the effective number of states it can reliably reach.

Montezuma's Revenge

Wasserstein Distance Maximizing Intrinsic Control

no code implementations28 Oct 2021 Ishan Durugkar, Steven Hansen, Stephen Spencer, Volodymyr Mnih

This paper deals with the problem of learning a skill-conditioned policy that acts meaningfully in the absence of a reward signal.

Discovering Diverse Nearly Optimal Policies with Successor Features

no code implementations ICML Workshop URL 2021 Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.

Relative Variational Intrinsic Control

no code implementations14 Dec 2020 Kate Baumli, David Warde-Farley, Steven Hansen, Volodymyr Mnih

In the absence of external rewards, agents can still learn useful behaviors by identifying and mastering a set of diverse skills within their environment.

Hierarchical Reinforcement Learning reinforcement-learning

Q-Learning in enormous action spaces via amortized approximate maximization

no code implementations22 Jan 2020 Tom Van de Wiele, David Warde-Farley, andriy mnih, Volodymyr Mnih

Applying Q-learning to high-dimensional or continuous action spaces can be difficult due to the required maximization over the set of possible actions.

Continuous Control Q-Learning

Fast Task Inference with Variational Intrinsic Successor Features

no code implementations ICLR 2020 Steven Hansen, Will Dabney, Andre Barreto, Tom Van de Wiele, David Warde-Farley, Volodymyr Mnih

It has been established that diverse behaviors spanning the controllable subspace of an Markov decision process can be trained by rewarding a policy for being distinguishable from other policies \citep{gregor2016variational, eysenbach2018diversity, warde2018unsupervised}.

Unsupervised Control Through Non-Parametric Discriminative Rewards

no code implementations ICLR 2019 David Warde-Farley, Tom Van de Wiele, tejas kulkarni, Catalin Ionescu, Steven Hansen, Volodymyr Mnih

Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research.


The Uncertainty Bellman Equation and Exploration

no code implementations ICML 2018 Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih

In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps.

Reinforcement Learning with Unsupervised Auxiliary Tasks

3 code implementations16 Nov 2016 Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z. Leibo, David Silver, Koray Kavukcuoglu

We also introduce a novel mechanism for focusing this representation upon extrinsic rewards, so that learning can rapidly adapt to the most relevant aspects of the actual task.


Combining policy gradient and Q-learning

no code implementations5 Nov 2016 Brendan O'Donoghue, Remi Munos, Koray Kavukcuoglu, Volodymyr Mnih

Policy gradient is an efficient technique for improving a policy in a reinforcement learning setting.

Atari Games Q-Learning

Sample Efficient Actor-Critic with Experience Replay

8 code implementations3 Nov 2016 Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, Nando de Freitas

This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems.

Continuous Control reinforcement-learning

Using Fast Weights to Attend to the Recent Past

4 code implementations NeurIPS 2016 Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu

Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs.

Strategic Attentive Writer for Learning Macro-Actions

no code implementations NeurIPS 2016 Alexander, Vezhnevets, Volodymyr Mnih, John Agapiou, Simon Osindero, Alex Graves, Oriol Vinyals, Koray Kavukcuoglu

We present a novel deep recurrent neural network architecture that learns to build implicit plans in an end-to-end manner by purely interacting with an environment in reinforcement learning setting.

Atari Games

Asynchronous Methods for Deep Reinforcement Learning

66 code implementations4 Feb 2016 Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu

We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers.

Atari Games reinforcement-learning

Policy Distillation

1 code implementation19 Nov 2015 Andrei A. Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell

Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance.


Human level control through deep reinforcement learning

2 code implementations25 Feb 2015 Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg1 & Demis Hassabis

We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters.

Atari Games reinforcement-learning

Recurrent Models of Visual Attention

17 code implementations NeurIPS 2014 Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels.

Hard Attention Image Classification +1

Playing Atari with Deep Reinforcement Learning

94 code implementations19 Dec 2013 Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller

We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning.

Atari Games Q-Learning +1

Generating more realistic images using gated MRF's

no code implementations NeurIPS 2010 Marc'Aurelio Ranzato, Volodymyr Mnih, Geoffrey E. Hinton

Probabilistic models of natural images are usually evaluated by measuring performance on rather indirect tasks, such as denoising and inpainting.


Cannot find the paper you are looking for? You can Submit a new open access paper.