Search Results for author: L. A. Prashanth

Found 8 papers, 1 papers with code

Risk Estimation in a Markov Cost Process: Lower and Upper Bounds

no code implementations17 Oct 2023 Gugan Thoppe, L. A. Prashanth, Sanjay Bhat

To the best of our knowledge, our work is the first to provide lower and upper bounds for estimating any risk measure beyond the mean within a Markovian setting.

Generalized Simultaneous Perturbation-based Gradient Search with Reduced Estimator Bias

no code implementations20 Dec 2022 Soumen Pachal, Shalabh Bhatnagar, L. A. Prashanth

We first present in detail unbalanced generalized simultaneous perturbation stochastic approximation (GSPSA) estimators and later present the balanced versions (B-GSPSA) of these.

Adaptive Estimation of Random Vectors with Bandit Feedback: A mean-squared error viewpoint

no code implementations31 Mar 2022 Dipayan Sen, L. A. Prashanth, Aditya Gopalan

We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round.

Online Estimation and Optimization of Utility-Based Shortfall Risk

1 code implementation16 Nov 2021 Vishwajit Hegde, Arvind S. Menon, L. A. Prashanth, Krishna Jagannathan

We derive non-asymptotic bounds on the estimation error in the number of samples.

On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

no code implementations12 Nov 2014 Nathaniel Korda, L. A. Prashanth

Furthermore, we propose a variant of TD(0) with linear approximators that incorporates a centering sequence, and establish that it exhibits an exponential rate of convergence in expectation.

Simultaneous Perturbation Algorithms for Batch Off-Policy Search

no code implementations18 Mar 2014 Raphael Fonteneau, L. A. Prashanth

We propose novel policy search algorithms in the context of off-policy, batch mode reinforcement learning (RL) with continuous state and action spaces.

Reinforcement Learning (RL)

Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games

no code implementations8 Jan 2014 H. L. Prasad, L. A. Prashanth, Shalabh Bhatnagar

We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game - Sub-Problem) conditions.

Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling

no code implementations11 Jun 2013 L. A. Prashanth, Nathaniel Korda, Rémi Munos

We propose a stochastic approximation (SA) based method with randomization of samples for policy evaluation using the least squares temporal difference (LSTD) algorithm.

Multi-Armed Bandits News Recommendation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.