no code implementations • 17 Oct 2023 • Gugan Thoppe, L. A. Prashanth, Sanjay Bhat
To the best of our knowledge, our work is the first to provide lower and upper bounds for estimating any risk measure beyond the mean within a Markovian setting.
no code implementations • 20 Dec 2022 • Soumen Pachal, Shalabh Bhatnagar, L. A. Prashanth
We first present in detail unbalanced generalized simultaneous perturbation stochastic approximation (GSPSA) estimators and later present the balanced versions (B-GSPSA) of these.
no code implementations • 31 Mar 2022 • Dipayan Sen, L. A. Prashanth, Aditya Gopalan
We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round.
1 code implementation • 16 Nov 2021 • Vishwajit Hegde, Arvind S. Menon, L. A. Prashanth, Krishna Jagannathan
We derive non-asymptotic bounds on the estimation error in the number of samples.
no code implementations • 12 Nov 2014 • Nathaniel Korda, L. A. Prashanth
Furthermore, we propose a variant of TD(0) with linear approximators that incorporates a centering sequence, and establish that it exhibits an exponential rate of convergence in expectation.
no code implementations • 18 Mar 2014 • Raphael Fonteneau, L. A. Prashanth
We propose novel policy search algorithms in the context of off-policy, batch mode reinforcement learning (RL) with continuous state and action spaces.
no code implementations • 8 Jan 2014 • H. L. Prasad, L. A. Prashanth, Shalabh Bhatnagar
We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game - Sub-Problem) conditions.
no code implementations • 11 Jun 2013 • L. A. Prashanth, Nathaniel Korda, Rémi Munos
We propose a stochastic approximation (SA) based method with randomization of samples for policy evaluation using the least squares temporal difference (LSTD) algorithm.