Exploring TD error as a heuristic for $σ$ selection in Q($σ$, $λ$)
In the landscape of TD algorithms, the Q($\sigma$, $\lambda$) algorithm is an algorithm with the ability to perform a multistep backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. $\sigma \in [0, 1]$ indicates the extent to which sampling is used. Selecting the value of {\sigma} can be based on characteristics of the current state rather than having a constant value or being time based. This report explores the viability of such a TD-error based scheme.
PDF AbstractCode
Tasks
Datasets
Add Datasets
introduced or used in this paper
Results from the Paper
Submit
results from this paper
to get state-of-the-art GitHub badges and help the
community compare results to other papers.
Methods
No methods listed for this paper. Add
relevant methods here