Exploring TD error as a heuristic for $σ$ selection in Q($σ$, $λ$)

21 Dec 2019  ·  Abhishek Nan ·

In the landscape of TD algorithms, the Q($\sigma$, $\lambda$) algorithm is an algorithm with the ability to perform a multistep backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. $\sigma \in [0, 1]$ indicates the extent to which sampling is used. Selecting the value of {\sigma} can be based on characteristics of the current state rather than having a constant value or being time based. This report explores the viability of such a TD-error based scheme.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here