no code implementations • 28 Jul 2015 • Prashanth L. A., H. L. Prasad, Shalabh Bhatnagar, Prakash Chandra
We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process.