True Online $TD\left(\lambda\right)$ seeks to approximate the ideal online $\lambda$-return algorithm. It seeks to invert this ideal forward-view algorithm to produce an efficient backward-view algorithm using eligibility traces. It uses dutch traces rather than accumulating traces.
Source: Sutton and Seijen
Paper | Code | Results | Date | Stars |
---|
Component | Type |
|
---|---|---|
Dutch Eligibility Trace
|
Eligibility Traces | (optional) |
Replacing Eligibility Trace
|
Eligibility Traces | (optional) |