We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning.
Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms.
Determining what experience to generate to best facilitate learning (i. e. exploration) is one of the distinguishing features and open challenges in reinforcement learning.
We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.
While using ES for differentiable parameters is computationally impractical (although possible), we show that a hybrid approach is practically feasible in the case where the model has both differentiable and non-differentiable parameters.
Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms.
In this paper we extend the SFs & GPI framework in two ways.
We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.
The scope of the Baldwin effect was recently called into question by two papers that closely examined the seminal work of Hinton and Nowlan.
Some real-world domains are best characterized as a single task, but for others this perspective is limiting.
Neural networks have a smooth initial inductive bias, such that small changes in input do not lead to large changes in output.
The deep reinforcement learning community has made several independent improvements to the DQN algorithm.
10 code implementations • 16 Aug 2017 • Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van Hasselt, David Silver, Timothy Lillicrap, Kevin Calderone, Paul Keet, Anthony Brunasso, David Lawrence, Anders Ekermo, Jacob Repp, Rodney Tsing
Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain.
Ranked #1 on Starcraft II on MoveToBeacon
5 code implementations • 12 Apr 2017 • Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys
We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.
We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning.
One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning.
We also introduce a novel mechanism for focusing this representation upon extrinsic rewards, so that learning can rapidly adapt to the most relevant aspects of the actual task.
Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks.
The move from hand-designed features to learned features in machine learning has been wildly successful.
We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations.
Ranked #7 on Atari Games on Atari 2600 Montezuma's Revenge
In recent years there have been many successes of using deep representations in reinforcement learning.
Ranked #2 on Atari Games on Atari 2600 Pong
Experience replay lets online reinforcement learning agents remember and reuse experiences from the past.
Ranked #3 on Atari Games on Atari 2600 Kangaroo
This paper presents Natural Evolution Strategies (NES), a recent family of algorithms that constitute a more principled approach to black-box optimization than established evolutionary algorithms.