Search Results for author: Prashanth L. A.

Found 19 papers, 1 papers with code

Optimization of utility-based shortfall risk: A non-asymptotic viewpoint

no code implementations28 Oct 2023 Sumedh Gupte, Prashanth L. A., Sanjay P. Bhat

In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical sample average approximation (SAA) of UBSR.

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

no code implementations12 Oct 2022 Gandharv Patil, Prashanth L. A., Dheeraj Nagaraj, Doina Precup

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging.

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

no code implementations30 Jul 2022 Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter.

Stochastic Optimization

A Survey of Risk-Aware Multi-Armed Bandits

no code implementations12 May 2022 Vincent Y. F. Tan, Prashanth L. A., Krishna Jagannathan

In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio.

Multi-Armed Bandits Portfolio Optimization

Estimation of Spectral Risk Measures

no code implementations22 Dec 2019 Ajay Kumar Pandey, Prashanth L. A., Sanjay P. Bhat

We consider the problem of estimating a spectral risk measure (SRM) from i. i. d.

Numerical Integration

A Wasserstein distance approach for concentration of empirical risk estimates

no code implementations NeurIPS 2019 Prashanth L. A., Sanjay P. Bhat

Previous concentration bounds are available only for specific risk measures such as CVaR and CPT-value.

Risk-Sensitive Reinforcement Learning via Policy Gradient Search

no code implementations22 Oct 2018 Prashanth L. A., Michael Fu

In this book, we consider risk-sensitive RL in two settings: one where the goal is to find a policy that optimizes the usual expected value objective while ensuring that a risk constraint is satisfied, and the other where the risk measure is the objective.

Policy Gradient Methods reinforcement-learning +1

Concentration bounds for empirical conditional value-at-risk: The unbounded case

no code implementations6 Aug 2018 Ravi Kumar Kolla, Prashanth L. A., Sanjay P. Bhat, Krishna Jagannathan

In several real-world applications involving decision making under uncertainty, the traditional expected value objective may not be suitable, as it may be necessary to control losses in the case of a rare but extreme event.

Decision Making Decision Making Under Uncertainty

Bandit algorithms to emulate human decision making using probabilistic distortions

no code implementations30 Nov 2016 Ravi Kumar Kolla, Prashanth L. A., Aditya Gopalan, Krishna Jagannathan, Michael Fu, Steve Marcus

For the $K$-armed bandit setting, we derive an upper bound on the expected regret for our proposed algorithm, and then we prove a matching lower bound to establish the order-optimality of our algorithm.

Decision Making Multi-Armed Bandits

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles

no code implementations22 Sep 2016 Xiaowei Hu, Prashanth L. A., András György, Csaba Szepesvári

Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients.

A constrained optimization perspective on actor critic algorithms and application to network routing

no code implementations28 Jul 2015 Prashanth L. A., H. L. Prasad, Shalabh Bhatnagar, Prakash Chandra

We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process.

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

no code implementations8 Jun 2015 Prashanth L. A., Cheng Jie, Michael Fu, Steve Marcus, Csaba Szepesvári

Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim.

reinforcement-learning Reinforcement Learning (RL)

Adaptive system optimization using random directions stochastic approximation

1 code implementation19 Feb 2015 Prashanth L. A., Shalabh Bhatnagar, Michael Fu, Steve Marcus

We prove the unbiasedness of both gradient and Hessian estimates and asymptotic (strong) convergence for both first-order and second-order schemes.

Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

no code implementations25 Mar 2014 Prashanth L. A., Mohammad Ghavamzadeh

For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize.

Decision Making

Two Timescale Convergent Q-learning for Sleep--Scheduling in Wireless Sensor Networks

no code implementations27 Dec 2013 Prashanth L. A., Abhranil Chatterjee, Shalabh Bhatnagar

For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation to handle the curse of dimensionality associated with the underlying POMDP.

feature selection Intrusion Detection +2

Actor-Critic Algorithms for Risk-Sensitive MDPs

no code implementations NeurIPS 2013 Prashanth L. A., Mohammad Ghavamzadeh

For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize.

Decision Making

Fast gradient descent for drifting least squares regression, with application to bandits

no code implementations11 Jul 2013 Nathaniel Korda, Prashanth L. A., Rémi Munos

In the case when strong convexity in the regression problem is guaranteed, we provide bounds on the error both in expectation and high probability (the latter is often needed to provide theoretical guarantees for higher level algorithms), despite the drifting least squares solution.

News Recommendation regression

Cannot find the paper you are looking for? You can Submit a new open access paper.