no code implementations • 9 Sep 2024 • Shubhada Agrawal, Prashanth L. A., Siva Theja Maguluri
We consider the problem of estimating the asymptotic variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean.
no code implementations • 28 Oct 2023 • Sumedh Gupte, Prashanth L. A., Sanjay P. Bhat
In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical sample average approximation (SAA) of UBSR.
no code implementations • 21 Apr 2023 • Mizhaan Prajit Maniyar, Akash Mondal, Prashanth L. A., Shalabh Bhatnagar
We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available.
no code implementations • 12 Oct 2022 • Gandharv Patil, Prashanth L. A., Dheeraj Nagaraj, Doina Precup
We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging.
no code implementations • 30 Jul 2022 • Akash Mondal, Prashanth L. A., Shalabh Bhatnagar
In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter.
no code implementations • 12 May 2022 • Vincent Y. F. Tan, Prashanth L. A., Krishna Jagannathan
In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio.
no code implementations • 22 Dec 2019 • Ajay Kumar Pandey, Prashanth L. A., Sanjay P. Bhat
We consider the problem of estimating a spectral risk measure (SRM) from i. i. d.
no code implementations • NeurIPS 2019 • Prashanth L. A., Sanjay P. Bhat
Previous concentration bounds are available only for specific risk measures such as CVaR and CPT-value.
no code implementations • ICML 2020 • Prashanth L. A., Krishna Jagannathan, Ravi Kumar Kolla
We derive concentration bounds for CVaR estimates, considering separately the cases of light-tailed and heavy-tailed distributions.
no code implementations • 22 Oct 2018 • Prashanth L. A., Michael Fu
In this book, we consider risk-sensitive RL in two settings: one where the goal is to find a policy that optimizes the usual expected value objective while ensuring that a risk constraint is satisfied, and the other where the risk measure is the objective.
no code implementations • 6 Aug 2018 • Ravi Kumar Kolla, Prashanth L. A., Sanjay P. Bhat, Krishna Jagannathan
In several real-world applications involving decision making under uncertainty, the traditional expected value objective may not be suitable, as it may be necessary to control losses in the case of a rare but extreme event.
no code implementations • 30 Nov 2016 • Ravi Kumar Kolla, Prashanth L. A., Aditya Gopalan, Krishna Jagannathan, Michael Fu, Steve Marcus
For the $K$-armed bandit setting, we derive an upper bound on the expected regret for our proposed algorithm, and then we prove a matching lower bound to establish the order-optimality of our algorithm.
no code implementations • 22 Sep 2016 • Xiaowei Hu, Prashanth L. A., András György, Csaba Szepesvári
Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients.
no code implementations • 28 Jul 2015 • Prashanth L. A., H. L. Prasad, Shalabh Bhatnagar, Prakash Chandra
We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process.
no code implementations • 8 Jun 2015 • Prashanth L. A., Cheng Jie, Michael Fu, Steve Marcus, Csaba Szepesvári
Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim.
1 code implementation • 19 Feb 2015 • Prashanth L. A., Shalabh Bhatnagar, Michael Fu, Steve Marcus
We prove the unbiasedness of both gradient and Hessian estimates and asymptotic (strong) convergence for both first-order and second-order schemes.
no code implementations • 25 Mar 2014 • Prashanth L. A., Mohammad Ghavamzadeh
For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize.
no code implementations • 27 Dec 2013 • Prashanth L. A., Abhranil Chatterjee, Shalabh Bhatnagar
For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation to handle the curse of dimensionality associated with the underlying POMDP.
no code implementations • NeurIPS 2013 • Prashanth L. A., Mohammad Ghavamzadeh
For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize.
no code implementations • 11 Jul 2013 • Nathaniel Korda, Prashanth L. A., Rémi Munos
In the case when strong convexity in the regression problem is guaranteed, we provide bounds on the error both in expectation and high probability (the latter is often needed to provide theoretical guarantees for higher level algorithms), despite the drifting least squares solution.