no code implementations • 28 Jul 2015 • Prashanth L. A., H. L. Prasad, Shalabh Bhatnagar, Prakash Chandra
We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process.
no code implementations • 1 Jul 2015 • H. L. Prasad, Shalabh Bhatnagar
However, the optimization problem there has a non-linear objective and non-linear constraints with special structure.
no code implementations • 8 Jan 2014 • H. L. Prasad, L. A. Prashanth, Shalabh Bhatnagar
We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game - Sub-Problem) conditions.