META-Learning State-based Eligibility Traces for More Sample-Efficient Policy Evaluation

25 Apr 2019Mingde ZhaoSitao LuanIan PoradaXiao-Wen ChangDoina Precup

Temporal-Difference (TD) learning is a standard and very successful reinforcement learning approach, at the core of both algorithms that learn the value of a given policy, as well as algorithms which learn how to improve policies. TD-learning with eligibility traces provides a way to boost sample efficiency by temporal credit assignment, i.e. deciding which portion of a reward should be assigned to predecessor states that occurred at different previous times, controlled by a parameter $\lambda$... (read more)

PDF Abstract