Search Results for author: Matthew J. A. Smith

Found 3 papers, 0 papers with code

Why Target Networks Stabilise Temporal Difference Methods

no code implementations24 Feb 2023 Mattie Fellows, Matthew J. A. Smith, Shimon Whiteson

Integral to recent successes in deep reinforcement learning has been a class of temporal difference methods that use infrequently updated target values for policy evaluation in a Markov Decision Process.

Stability and Generalisation in Batch Reinforcement Learning

no code implementations29 Sep 2021 Matthew J. A. Smith, Shimon Whiteson

Overfitting has been recently acknowledged as a key limiting factor in the capabilities of reinforcement learning algorithms, despite little theoretical characterisation.

reinforcement-learning Reinforcement Learning (RL)

An inference-based policy gradient method for learning options

no code implementations ICLR 2018 Matthew J. A. Smith, Herke van Hoof, Joelle Pineau

In this work we develop a novel policy gradient method for the automatic learning of policies with options.

Cannot find the paper you are looking for? You can Submit a new open access paper.