no code implementations • 24 Feb 2023 • Mattie Fellows, Matthew J. A. Smith, Shimon Whiteson
Integral to recent successes in deep reinforcement learning has been a class of temporal difference methods that use infrequently updated target values for policy evaluation in a Markov Decision Process.
no code implementations • 29 Sep 2021 • Matthew J. A. Smith, Shimon Whiteson
Overfitting has been recently acknowledged as a key limiting factor in the capabilities of reinforcement learning algorithms, despite little theoretical characterisation.
no code implementations • ICLR 2018 • Matthew J. A. Smith, Herke van Hoof, Joelle Pineau
In this work we develop a novel policy gradient method for the automatic learning of policies with options.