Search Results for author: Tom Schaul

Found 35 papers, 15 papers with code

Vision-Language Models as a Source of Rewards

no code implementations • 14 Dec 2023 • Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald, Luyu Wang, Lei Zhang

Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning.

reinforcement-learning

Paper
Add Code

Discovering Attention-Based Genetic Algorithms via Meta-Black-Box Optimization

1 code implementation • 8 Apr 2023 • Robert Tjarko Lange, Tom Schaul, Yutian Chen, Chris Lu, Tom Zahavy, Valentin Dalibard, Sebastian Flennerhag

Genetic algorithms constitute a family of black-box optimization algorithms, which take inspiration from the principles of biological evolution.

450

Paper
Code

Scaling Goal-based Exploration via Pruning Proto-goals

1 code implementation • 9 Feb 2023 • Akhil Bagaria, Ray Jiang, Ramana Kumar, Tom Schaul

One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short.

reinforcement-learning Reinforcement Learning (RL)

456

Paper
Code

Discovering Evolution Strategies via Meta-Black-Box Optimization

1 code implementation • 21 Nov 2022 • Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dallibard, Chris Lu, Satinder Singh, Sebastian Flennerhag

Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies.

Continuous Control Meta-Learning

450

Paper
Code

The Phenomenon of Policy Churn

no code implementations • 1 Jun 2022 • Tom Schaul, André Barreto, John Quan, Georg Ostrovski

We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Model-Value Inconsistency as a Signal for Epistemic Uncertainty

no code implementations • 8 Dec 2021 • Angelos Filos, Eszter Vértes, Zita Marinho, Gregory Farquhar, Diana Borsa, Abram Friesen, Feryal Behbahani, Tom Schaul, André Barreto, Simon Osindero

Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms.

Model-based Reinforcement Learning Rolling Shutter Correction

Paper
Add Code

When should agents explore?

no code implementations • NeurIPS 2021 • Miruna Pîslar, David Szepesvari, Georg Ostrovski, Diana Borsa, Tom Schaul

Exploration remains a central challenge for reinforcement learning (RL).

Reinforcement Learning (RL)

Paper
Add Code

Return-based Scaling: Yet Another Normalisation Trick for Deep RL

no code implementations • 11 May 2021 • Tom Schaul, Georg Ostrovski, Iurii Kemaev, Diana Borsa

Scaling issues are mundane yet irritating for practitioners of reinforcement learning.

Atari Games reinforcement-learning +1

Paper
Add Code

Policy Evaluation Networks

no code implementations • 26 Feb 2020 • Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon

The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states.

Paper
Add Code

Adapting Behaviour for Learning Progress

no code implementations • 14 Dec 2019 • Tom Schaul, Diana Borsa, David Ding, David Szepesvari, Georg Ostrovski, Will Dabney, Simon Osindero

Determining what experience to generate to best facilitate learning (i. e. exploration) is one of the distinguishing features and open challenges in reinforcement learning.

Atari Games

Paper
Add Code

Conditional Importance Sampling for Off-Policy Learning

no code implementations • 16 Oct 2019 • Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney

We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Non-Differentiable Supervised Learning with Evolution Strategies and Hybrid Methods

no code implementations • 7 Jun 2019 • Karel Lenc, Erich Elsen, Tom Schaul, Karen Simonyan

While using ES for differentiable parameters is computationally impractical (although possible), we show that a hybrid approach is practically feasible in the case where the model has both differentiable and non-differentiable parameters.

Paper
Add Code

Ray Interference: a Source of Plateaus in Deep Reinforcement Learning

no code implementations • 25 Apr 2019 • Tom Schaul, Diana Borsa, Joseph Modayil, Razvan Pascanu

Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

no code implementations • ICML 2018 • André Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Žídek, Rémi Munos

In this paper we extend the SFs & GPI framework in two ways.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Universal Successor Features Approximators

1 code implementation • ICLR 2019 • Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul

We focus on one aspect in particular, namely the ability to generalise to unseen tasks.

Navigate Reinforcement Learning (RL)

Paper
Code

The Barbados 2018 List of Open Issues in Continual Learning

no code implementations • 16 Nov 2018 • Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments.

Continual Learning

Paper
Add Code

Meta-Learning by the Baldwin Effect

no code implementations • 6 Jun 2018 • Chrisantha Thomas Fernando, Jakub Sygnowski, Simon Osindero, Jane Wang, Tom Schaul, Denis Teplyashin, Pablo Sprechmann, Alexander Pritzel, Andrei A. Rusu

The scope of the Baldwin effect was recently called into question by two papers that closely examined the seminal work of Hinton and Nowlan.

Meta-Learning

Paper
Add Code

Unicorn: Continual Learning with a Universal, Off-policy Agent

no code implementations • 22 Feb 2018 • Daniel J. Mankowitz, Augustin Žídek, André Barreto, Dan Horgan, Matteo Hessel, John Quan, Junhyuk Oh, Hado van Hasselt, David Silver, Tom Schaul

Some real-world domains are best characterized as a single task, but for others this perspective is limiting.

Continual Learning

Paper
Add Code

Natural Value Approximators: Learning when to Trust Past Estimates

no code implementations • NeurIPS 2017 • Zhongwen Xu, Joseph Modayil, Hado P. Van Hasselt, Andre Barreto, David Silver, Tom Schaul

Neural networks have a smooth initial inductive bias, such that small changes in input do not lead to large changes in output.

Atari Games Inductive Bias +2

Paper
Add Code

Rainbow: Combining Improvements in Deep Reinforcement Learning

32 code implementations • 6 Oct 2017 • Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver

The deep reinforcement learning community has made several independent improvements to the DQN algorithm.

Ranked #3 on Montezuma's Revenge on Atari 2600 Montezuma's Revenge

Montezuma's Revenge reinforcement-learning +1

7,458

Paper
Code

StarCraft II: A New Challenge for Reinforcement Learning

11 code implementations • 16 Aug 2017 • Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van Hasselt, David Silver, Timothy Lillicrap, Kevin Calderone, Paul Keet, Anthony Brunasso, David Lawrence, Anders Ekermo, Jacob Repp, Rodney Tsing

Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain.

Ranked #1 on Starcraft II on MoveToBeacon

reinforcement-learning Reinforcement Learning (RL) +2

7,921

Paper
Code

Deep Q-learning from Demonstrations

5 code implementations • 12 Apr 2017 • Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.

Imitation Learning Q-Learning +1

2,597

Paper
Code

FeUdal Networks for Hierarchical Reinforcement Learning

1 code implementation • ICML 2017 • Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu

We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Code

The Predictron: End-To-End Learning and Planning

1 code implementation • ICML 2017 • David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris

One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning.

290

Paper
Code

Reinforcement Learning with Unsupervised Auxiliary Tasks

3 code implementations • 16 Nov 2016 • Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z. Leibo, David Silver, Koray Kavukcuoglu

We also introduce a novel mechanism for focusing this representation upon extrinsic rewards, so that learning can rapidly adapt to the most relevant aspects of the actual task.

reinforcement-learning Reinforcement Learning (RL)

416

Paper
Code

Successor Features for Transfer in Reinforcement Learning

no code implementations • NeurIPS 2017 • André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver

Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning to learn by gradient descent by gradient descent

8 code implementations • NeurIPS 2016 • Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, Nando de Freitas

The move from hand-designed features to learned features in machine learning has been wildly successful.

Meta-Learning

4,061

Paper
Code

Unifying Count-Based Exploration and Intrinsic Motivation

1 code implementation • NeurIPS 2016 • Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos

We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations.

Ranked #10 on Atari Games on Atari 2600 Montezuma's Revenge

Montezuma's Revenge reinforcement-learning +1

Paper
Code

Dueling Network Architectures for Deep Reinforcement Learning

73 code implementations • 20 Nov 2015 • Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas

In recent years there have been many successes of using deep representations in reinforcement learning.

Ranked #1 on Atari Games on Atari 2600 Pong

Atari Games reinforcement-learning +1

48,719

Paper
Code

Prioritized Experience Replay

77 code implementations • 18 Nov 2015 • Tom Schaul, John Quan, Ioannis Antonoglou, David Silver

Experience replay lets online reinforcement learning agents remember and reuse experiences from the past.

Ranked #3 on Atari Games on Atari 2600 Kangaroo

Atari Games reinforcement-learning +1

48,719

Paper
Code

Unit Tests for Stochastic Optimization

no code implementations • 20 Dec 2013 • Tom Schaul, Ioannis Antonoglou, David Silver

Optimization by stochastic gradient descent is an important component of many large-scale machine learning algorithms.

Stochastic Optimization

Paper
Add Code

Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

no code implementations • 16 Jan 2013 • Tom Schaul, Yann Lecun

Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD).

Paper
Add Code

No More Pesky Learning Rates

no code implementations • 6 Jun 2012 • Tom Schaul, Sixin Zhang, Yann Lecun

The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time.

Paper
Add Code

Measuring Intelligence through Games

no code implementations • 6 Sep 2011 • Tom Schaul, Julian Togelius, Jürgen Schmidhuber

Artificial general intelligence (AGI) refers to research aimed at tackling the full problem of artificial intelligence, that is, create truly intelligent agents.

Motion Planning

Paper
Add Code

Natural Evolution Strategies

1 code implementation • 22 Jun 2011 • Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jürgen Schmidhuber

This paper presents Natural Evolution Strategies (NES), a recent family of algorithms that constitute a more principled approach to black-box optimization than established evolutionary algorithms.

Evolutionary Algorithms

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.