HR-TD: A Regularized TD Method to Avoid Over-Generalization

ICLR 2019  ·  Ishan Durugkar, Bo Liu, Peter Stone ·

Temporal Difference learning with function approximation has been widely used recently and has led to several successful results. However, compared with the original tabular-based methods, one major drawback of temporal difference learning with neural networks and other function approximators is that they tend to over-generalize across temporally successive states, resulting in slow convergence and even instability. In this work, we propose a novel TD learning method, Hadamard product Regularized TD (HR-TD), that reduces over-generalization and thus leads to faster convergence. This approach can be easily applied to both linear and nonlinear function approximators. HR-TD is evaluated on several linear and nonlinear benchmark domains, where we show improvement in learning behavior and performance.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here