Properties of the Least Squares Temporal Difference learning algorithm

22 Jan 2013 Kamil Ciosek

This paper presents four different ways of looking at the well-known Least Squares Temporal Differences (LSTD) algorithm for computing the value function of a Markov Reward Process, each of them leading to different insights: the operator-theory approach via the Galerkin method, the statistical approach via instrumental variables, the linear dynamical system view as well as the limit of the TD iteration. We also give a geometric view of the algorithm as an oblique projection... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet