no code implementations • 19 May 2016 • Prasenjit Karmakar, Rajkumar Maity, Shalabh Bhatnagar
In this paper we provide a rigorous convergence analysis of a "off"-policy temporal difference learning algorithm with linear function approximation and per time-step linear computational complexity in "online" learning environment.