no code implementations • NeurIPS 2021 • Matthew Fellows, Kristian Hartikainen, Shimon Whiteson
We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator.
1 code implementation • NeurIPS 2019 • Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson
This gives VIREL a mode-seeking form of KL divergence, the ability to learn deterministic optimal polices naturally from inference and the ability to optimise value functions and policies in separate, iterative steps.
no code implementations • ICML 2018 • Matthew Fellows, Kamil Ciosek, Shimon Whiteson
We propose a new way of deriving policy gradient updates for reinforcement learning.