1 code implementation • 10 Oct 2022 • Soumyajit Guin, Shalabh Bhatnagar
In many situations, finite horizon control problems are of interest and for such problems, the optimal policies are time-varying in general.
no code implementations • 10 Oct 2022 • Shalabh Bhatnagar, Vivek S. Borkar, Soumyajit Guin
We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale.