Search Results for author: Volodymyr Mnih

Found 27 papers, 14 papers with code

Vision-Language Models as a Source of Rewards

no code implementations • 14 Dec 2023 • Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald, Luyu Wang, Lei Zhang

Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning.

reinforcement-learning

Paper
Add Code

In-context Reinforcement Learning with Algorithm Distillation

1 code implementation • 25 Oct 2022 • Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model.

reinforcement-learning

Paper
Code

Palm up: Playing in the Latent Manifold for Unsupervised Pretraining

no code implementations • 19 Oct 2022 • Hao liu, Tom Zahavy, Volodymyr Mnih, Satinder Singh

In this work, we aim to bring the best of both worlds and propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Add Code

Entropic Desired Dynamics for Intrinsic Control

no code implementations • NeurIPS 2021 • Steven Hansen, Guillaume Desjardins, Kate Baumli, David Warde-Farley, Nicolas Heess, Simon Osindero, Volodymyr Mnih

An agent might be said, informally, to have mastery of its environment when it has maximised the effective number of states it can reliably reach.

Montezuma's Revenge

Paper
Add Code

Wasserstein Distance Maximizing Intrinsic Control

no code implementations • 28 Oct 2021 • Ishan Durugkar, Steven Hansen, Stephen Spencer, Volodymyr Mnih

This paper deals with the problem of learning a skill-conditioned policy that acts meaningfully in the absence of a reward signal.

Paper
Add Code

Discovering Diverse Nearly Optimal Policies with Successor Features

no code implementations • ICML Workshop URL 2021 • Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.

Paper
Add Code

Relative Variational Intrinsic Control

no code implementations • 14 Dec 2020 • Kate Baumli, David Warde-Farley, Steven Hansen, Volodymyr Mnih

In the absence of external rewards, agents can still learn useful behaviors by identifying and mastering a set of diverse skills within their environment.

Hierarchical Reinforcement Learning

Paper
Add Code

Q-Learning in enormous action spaces via amortized approximate maximization

no code implementations • 22 Jan 2020 • Tom Van de Wiele, David Warde-Farley, andriy mnih, Volodymyr Mnih

Applying Q-learning to high-dimensional or continuous action spaces can be difficult due to the required maximization over the set of possible actions.

Continuous Control Q-Learning

Paper
Add Code

Unsupervised Learning of Object Keypoints for Perception and Control

6 code implementations • NeurIPS 2019 • Tejas Kulkarni, Ankush Gupta, Catalin Ionescu, Sebastian Borgeaud, Malcolm Reynolds, Andrew Zisserman, Volodymyr Mnih

In this work we aim to learn object representations that are useful for control and reinforcement learning (RL).

3D Action Recognition Image Classification +6

12,780

Paper
Code

Fast Task Inference with Variational Intrinsic Successor Features

no code implementations • ICLR 2020 • Steven Hansen, Will Dabney, Andre Barreto, Tom Van de Wiele, David Warde-Farley, Volodymyr Mnih

It has been established that diverse behaviors spanning the controllable subspace of an Markov decision process can be trained by rewarding a policy for being distinguishable from other policies \citep{gregor2016variational, eysenbach2018diversity, warde2018unsupervised}.

Paper
Add Code

Unsupervised Control Through Non-Parametric Discriminative Rewards

no code implementations • ICLR 2019 • David Warde-Farley, Tom Van de Wiele, tejas kulkarni, Catalin Ionescu, Steven Hansen, Volodymyr Mnih

Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research.

Reinforcement Learning (RL)

Paper
Add Code

Learning by Playing - Solving Sparse Reward Tasks from Scratch

1 code implementation • ICML 2018 • Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, Jost Tobias Springenberg

We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm in the context of Reinforcement Learning (RL).

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

The Uncertainty Bellman Equation and Exploration

1 code implementation • ICML 2018 • Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih

In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps.

Paper
Code

Reinforcement Learning with Unsupervised Auxiliary Tasks

3 code implementations • 16 Nov 2016 • Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z. Leibo, David Silver, Koray Kavukcuoglu

We also introduce a novel mechanism for focusing this representation upon extrinsic rewards, so that learning can rapidly adapt to the most relevant aspects of the actual task.

reinforcement-learning Reinforcement Learning (RL)

415

Paper
Code

Combining policy gradient and Q-learning

no code implementations • 5 Nov 2016 • Brendan O'Donoghue, Remi Munos, Koray Kavukcuoglu, Volodymyr Mnih

Policy gradient is an efficient technique for improving a policy in a reinforcement learning setting.

Atari Games Q-Learning

Paper
Add Code

Sample Efficient Actor-Critic with Experience Replay

8 code implementations • 3 Nov 2016 • Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray Kavukcuoglu, Nando de Freitas

This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems.

Continuous Control reinforcement-learning +1

4,038

Paper
Code

Using Fast Weights to Attend to the Recent Past

3 code implementations • NeurIPS 2016 • Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu

Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs.

262

Paper
Code

Strategic Attentive Writer for Learning Macro-Actions

no code implementations • NeurIPS 2016 • Alexander, Vezhnevets, Volodymyr Mnih, John Agapiou, Simon Osindero, Alex Graves, Oriol Vinyals, Koray Kavukcuoglu

We present a novel deep recurrent neural network architecture that learns to build implicit plans in an end-to-end manner by purely interacting with an environment in reinforcement learning setting.

Atari Games

Paper
Add Code

Learning values across many orders of magnitude

no code implementations • NeurIPS 2016 • Hado van Hasselt, Arthur Guez, Matteo Hessel, Volodymyr Mnih, David Silver

Most learning algorithms are not invariant to the scale of the function that is being approximated.

Ranked #12 on Atari Games on Atari 2600 Centipede

Atari Games reinforcement-learning +1

Paper
Add Code

Asynchronous Methods for Deep Reinforcement Learning

70 code implementations • 4 Feb 2016 • Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu

We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers.

Ranked #9 on Atari Games on Atari 2600 Star Gunner

Atari Games reinforcement-learning +1

30,980

Paper
Code

Policy Distillation

1 code implementation • 19 Nov 2015 • Andrei A. Rusu, Sergio Gomez Colmenarejo, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell

Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Massively Parallel Methods for Deep Reinforcement Learning

3 code implementations • 15 Jul 2015 • Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, Shane Legg, Volodymyr Mnih, Koray Kavukcuoglu, David Silver

We present the first massively distributed architecture for deep reinforcement learning.

Ranked #17 on Atari Games on Atari 2600 Private Eye

Atari Games reinforcement-learning +1

Paper
Code

Human level control through deep reinforcement learning

7 code implementations • 25 Feb 2015 • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg1 & Demis Hassabis

We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters.

Atari Games reinforcement-learning +1

143

Paper
Code

Multiple Object Recognition with Visual Attention

5 code implementations • 24 Dec 2014 • Jimmy Ba, Volodymyr Mnih, Koray Kavukcuoglu

We present an attention-based model for recognizing multiple objects in images.

Object Object Recognition +2

Paper
Code

Recurrent Models of Visual Attention

19 code implementations • NeurIPS 2014 • Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels.

Hard Attention Image Classification +1

466

Paper
Code

Playing Atari with Deep Reinforcement Learning

111 code implementations • 19 Dec 2013 • Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller

We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning.

Ranked #1 on Atari Games on Atari 2600 Pong

Atari Games Q-Learning +1

47,594

Paper
Code

Generating more realistic images using gated MRF's

no code implementations • NeurIPS 2010 • Marc'Aurelio Ranzato, Volodymyr Mnih, Geoffrey E. Hinton

Probabilistic models of natural images are usually evaluated by measuring performance on rather indirect tasks, such as denoising and inpainting.

Denoising

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.