Search Results for author: Tom Schaul

Found 30 papers, 12 papers with code

The Phenomenon of Policy Churn

no code implementations1 Jun 2022 Tom Schaul, André Barreto, John Quan, Georg Ostrovski

We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning.

reinforcement-learning

Model-Value Inconsistency as a Signal for Epistemic Uncertainty

no code implementations8 Dec 2021 Angelos Filos, Eszter Vértes, Zita Marinho, Gregory Farquhar, Diana Borsa, Abram Friesen, Feryal Behbahani, Tom Schaul, André Barreto, Simon Osindero

Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms.

Model-based Reinforcement Learning

Policy Evaluation Networks

no code implementations26 Feb 2020 Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon

The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states.

reinforcement-learning

Adapting Behaviour for Learning Progress

no code implementations14 Dec 2019 Tom Schaul, Diana Borsa, David Ding, David Szepesvari, Georg Ostrovski, Will Dabney, Simon Osindero

Determining what experience to generate to best facilitate learning (i. e. exploration) is one of the distinguishing features and open challenges in reinforcement learning.

Atari Games

Conditional Importance Sampling for Off-Policy Learning

no code implementations16 Oct 2019 Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney

We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

reinforcement-learning

Non-Differentiable Supervised Learning with Evolution Strategies and Hybrid Methods

no code implementations7 Jun 2019 Karel Lenc, Erich Elsen, Tom Schaul, Karen Simonyan

While using ES for differentiable parameters is computationally impractical (although possible), we show that a hybrid approach is practically feasible in the case where the model has both differentiable and non-differentiable parameters.

Ray Interference: a Source of Plateaus in Deep Reinforcement Learning

no code implementations25 Apr 2019 Tom Schaul, Diana Borsa, Joseph Modayil, Razvan Pascanu

Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms.

reinforcement-learning

The Barbados 2018 List of Open Issues in Continual Learning

no code implementations16 Nov 2018 Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.

Continual Learning

Meta-Learning by the Baldwin Effect

no code implementations6 Jun 2018 Chrisantha Thomas Fernando, Jakub Sygnowski, Simon Osindero, Jane Wang, Tom Schaul, Denis Teplyashin, Pablo Sprechmann, Alexander Pritzel, Andrei A. Rusu

The scope of the Baldwin effect was recently called into question by two papers that closely examined the seminal work of Hinton and Nowlan.

Meta-Learning

Deep Q-learning from Demonstrations

5 code implementations12 Apr 2017 Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.

Decision Making Imitation Learning +1

Reinforcement Learning with Unsupervised Auxiliary Tasks

3 code implementations16 Nov 2016 Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z. Leibo, David Silver, Koray Kavukcuoglu

We also introduce a novel mechanism for focusing this representation upon extrinsic rewards, so that learning can rapidly adapt to the most relevant aspects of the actual task.

reinforcement-learning

Successor Features for Transfer in Reinforcement Learning

no code implementations NeurIPS 2017 André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver

Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks.

reinforcement-learning

Prioritized Experience Replay

70 code implementations18 Nov 2015 Tom Schaul, John Quan, Ioannis Antonoglou, David Silver

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past.

Atari Games reinforcement-learning

Unit Tests for Stochastic Optimization

no code implementations20 Dec 2013 Tom Schaul, Ioannis Antonoglou, David Silver

Optimization by stochastic gradient descent is an important component of many large-scale machine learning algorithms.

Stochastic Optimization

Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

no code implementations16 Jan 2013 Tom Schaul, Yann Lecun

Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD).

No More Pesky Learning Rates

no code implementations6 Jun 2012 Tom Schaul, Sixin Zhang, Yann Lecun

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time.

Natural Evolution Strategies

1 code implementation22 Jun 2011 Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jürgen Schmidhuber

This paper presents Natural Evolution Strategies (NES), a recent family of algorithms that constitute a more principled approach to black-box optimization than established evolutionary algorithms.

Cannot find the paper you are looking for? You can Submit a new open access paper.