Search Results for author: L. A. Prashanth

Found 8 papers, 1 papers with code

Risk Estimation in a Markov Cost Process: Lower and Upper Bounds

no code implementations • 17 Oct 2023 • Gugan Thoppe, L. A. Prashanth, Sanjay Bhat

To the best of our knowledge, our work is the first to provide lower and upper bounds for estimating any risk measure beyond the mean within a Markovian setting.

Paper
Add Code

Generalized Simultaneous Perturbation-based Gradient Search with Reduced Estimator Bias

no code implementations • 20 Dec 2022 • Soumen Pachal, Shalabh Bhatnagar, L. A. Prashanth

We first present in detail unbalanced generalized simultaneous perturbation stochastic approximation (GSPSA) estimators and later present the balanced versions (B-GSPSA) of these.

Paper
Add Code

Adaptive Estimation of Random Vectors with Bandit Feedback: A mean-squared error viewpoint

no code implementations • 31 Mar 2022 • Dipayan Sen, L. A. Prashanth, Aditya Gopalan

We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round.

Paper
Add Code

Online Estimation and Optimization of Utility-Based Shortfall Risk

1 code implementation • 16 Nov 2021 • Vishwajit Hegde, Arvind S. Menon, L. A. Prashanth, Krishna Jagannathan

We derive non-asymptotic bounds on the estimation error in the number of samples.

Paper
Code

On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

no code implementations • 12 Nov 2014 • Nathaniel Korda, L. A. Prashanth

Furthermore, we propose a variant of TD(0) with linear approximators that incorporates a centering sequence, and establish that it exhibits an exponential rate of convergence in expectation.

Paper
Add Code

Simultaneous Perturbation Algorithms for Batch Off-Policy Search

no code implementations • 18 Mar 2014 • Raphael Fonteneau, L. A. Prashanth

We propose novel policy search algorithms in the context of off-policy, batch mode reinforcement learning (RL) with continuous state and action spaces.

Reinforcement Learning (RL)

Paper
Add Code

Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games

no code implementations • 8 Jan 2014 • H. L. Prasad, L. A. Prashanth, Shalabh Bhatnagar

We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game - Sub-Problem) conditions.

Paper
Add Code

Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling

no code implementations • 11 Jun 2013 • L. A. Prashanth, Nathaniel Korda, Rémi Munos

We propose a stochastic approximation (SA) based method with randomization of samples for policy evaluation using the least squares temporal difference (LSTD) algorithm.

Multi-Armed Bandits News Recommendation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.