TD Learning with Constrained Gradients

ICLR 2018 Ishan DurugkarPeter Stone

Temporal Difference Learning with function approximation is known to be unstable. Previous work like \citet{sutton2009fast} and \citet{sutton2009convergent} has presented alternative objectives that are stable to minimize... (read more)

PDF Abstract


No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.