Eligibility Traces

Eligibility Trace

An Eligibility Trace is a memory vector $\textbf{z}_{t} \in \mathbb{R}^{d}$ that parallels the long-term weight vector $\textbf{w}_{t} \in \mathbb{R}^{d}$. The idea is that when a component of $\textbf{w}_{t}$ participates in producing an estimated value, the corresponding component of $\textbf{z}_{t}$ is bumped up and then begins to fade away. Learning will then occur in that component of $\textbf{w}_{t}$ if a nonzero TD error occurs before the trade falls back to zero. The trace-decay parameter $\lambda \in \left[0, 1\right]$ determines the rate at which the trace falls.

Intuitively, they tackle the credit assignment problem by capturing both a frequency heuristic - states that are visited more often deserve more credit - and a recency heuristic - states that are visited more recently deserve more credit.

$$E_{0}\left(s\right) = 0 $$ $$E_{t}\left(s\right) = \gamma\lambda{E}_{t-1}\left(s\right) + \textbf{1}\left(S_{t} = s\right) $$

Source: Sutton and Barto, Reinforcement Learning, 2nd Edition

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Meta-Learning 3 75.00%
Atari Games 1 25.00%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories