1 code implementation • • H. Francis Song, Abbas Abdolmaleki, Jost Tobias Springenberg, Aidan Clark, Hubert Soyer, Jack W. Rae, Seb Noury, Arun Ahuja, Si-Qi Liu, Dhruva Tirumala, Nicolas Heess, Dan Belov, Martin Riedmiller, Matthew M. Botvinick
Some of the most successful applications of deep reinforcement learning to challenging domains in discrete and continuous control have used policy gradient methods in the on-policy setting.
no code implementations • • Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team
This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions.
Reinforcement learning agents are typically trained and evaluated according to their performance averaged over some distribution of environment settings.
This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on.
Ranked #1 on Visual Navigation on Dmlab-30
Reinforcement learning (RL) agents performing complex tasks must be able to remember observations and actions across sizable time intervals.
In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters.
Ranked #3 on Atari Games on Atari 2600 Skiing (using extra training data)
1 code implementation • 20 Jun 2017 • Karl Moritz Hermann, Felix Hill, Simon Green, Fumin Wang, Ryan Faulkner, Hubert Soyer, David Szepesvari, Wojciech Marian Czarnecki, Max Jaderberg, Denis Teplyashin, Marcus Wainwright, Chris Apps, Demis Hassabis, Phil Blunsom
Trained via a combination of reinforcement and unsupervised learning, and beginning with minimal prior knowledge, the agent learns to relate linguistic symbols to emergent perceptual representations of its physical surroundings and to pertinent sequences of actions.
We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL.
1 code implementation • 11 Nov 2016 • Piotr Mirowski, Razvan Pascanu, Fabio Viola, Hubert Soyer, Andrew J. Ballard, Andrea Banino, Misha Denil, Ross Goroshin, Laurent SIfre, Koray Kavukcuoglu, Dharshan Kumaran, Raia Hadsell
Learning to navigate in complex environments with dynamic elements is an important milestone in developing AI agents.
Learning to solve complex sequences of tasks--while both leveraging transfer and avoiding catastrophic forgetting--remains a key obstacle to achieving human-level intelligence.
In this work, we present a novel neural network based architecture for inducing compositional crosslingual word representations.