no code implementations • 13 Sep 2019 • Michael Lutter, Boris Belousov, Kim Listmann, Debora Clever, Jan Peters
The corresponding optimal value function is learned end-to-end by embedding a deep differential network in the Hamilton-Jacobi-Bellmann differential equation and minimizing the error of this equality while simultaneously decreasing the discounting from short- to far-sighted to enable the learning.