Search Results for author: Prashanth L. A.

Found 19 papers, 1 papers with code

Optimization of utility-based shortfall risk: A non-asymptotic viewpoint

no code implementations • 28 Oct 2023 • Sumedh Gupte, Prashanth L. A., Sanjay P. Bhat

In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical sample average approximation (SAA) of UBSR.

Paper
Add Code

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

no code implementations • 21 Apr 2023 • Mizhaan Prajit Maniyar, Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

no code implementations • 12 Oct 2022 • Gandharv Patil, Prashanth L. A., Dheeraj Nagaraj, Doina Precup

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging.

Paper
Add Code

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

no code implementations • 30 Jul 2022 • Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter.

Stochastic Optimization

Paper
Add Code

A Survey of Risk-Aware Multi-Armed Bandits

no code implementations • 12 May 2022 • Vincent Y. F. Tan, Prashanth L. A., Krishna Jagannathan

In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio.

Multi-Armed Bandits Portfolio Optimization

Paper
Add Code

Estimation of Spectral Risk Measures

no code implementations • 22 Dec 2019 • Ajay Kumar Pandey, Prashanth L. A., Sanjay P. Bhat

We consider the problem of estimating a spectral risk measure (SRM) from i. i. d.

Numerical Integration

Paper
Add Code

A Wasserstein distance approach for concentration of empirical risk estimates

no code implementations • NeurIPS 2019 • Prashanth L. A., Sanjay P. Bhat

Previous concentration bounds are available only for specific risk measures such as CVaR and CPT-value.

Paper
Add Code

Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions

no code implementations • ICML 2020 • Prashanth L. A., Krishna Jagannathan, Ravi Kumar Kolla

We derive concentration bounds for CVaR estimates, considering separately the cases of light-tailed and heavy-tailed distributions.

Multi-Armed Bandits

Paper
Add Code

Risk-Sensitive Reinforcement Learning via Policy Gradient Search

no code implementations • 22 Oct 2018 • Prashanth L. A., Michael Fu

In this book, we consider risk-sensitive RL in two settings: one where the goal is to find a policy that optimizes the usual expected value objective while ensuring that a risk constraint is satisfied, and the other where the risk measure is the objective.

Policy Gradient Methods reinforcement-learning +1

Paper
Add Code

Concentration bounds for empirical conditional value-at-risk: The unbounded case

no code implementations • 6 Aug 2018 • Ravi Kumar Kolla, Prashanth L. A., Sanjay P. Bhat, Krishna Jagannathan

In several real-world applications involving decision making under uncertainty, the traditional expected value objective may not be suitable, as it may be necessary to control losses in the case of a rare but extreme event.

Decision Making Decision Making Under Uncertainty

Paper
Add Code

Bandit algorithms to emulate human decision making using probabilistic distortions

no code implementations • 30 Nov 2016 • Ravi Kumar Kolla, Prashanth L. A., Aditya Gopalan, Krishna Jagannathan, Michael Fu, Steve Marcus

For the $K$-armed bandit setting, we derive an upper bound on the expected regret for our proposed algorithm, and then we prove a matching lower bound to establish the order-optimality of our algorithm.

Decision Making Multi-Armed Bandits

Paper
Add Code

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles

no code implementations • 22 Sep 2016 • Xiaowei Hu, Prashanth L. A., András György, Csaba Szepesvári

Algorithms for bandit convex optimization and online learning often rely on constructing noisy gradient estimates, which are then used in appropriately adjusted first-order algorithms, replacing actual gradients.

Paper
Add Code

A constrained optimization perspective on actor critic algorithms and application to network routing

no code implementations • 28 Jul 2015 • Prashanth L. A., H. L. Prasad, Shalabh Bhatnagar, Prakash Chandra

We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process.

Paper
Add Code

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

no code implementations • 8 Jun 2015 • Prashanth L. A., Cheng Jie, Michael Fu, Steve Marcus, Csaba Szepesvári

Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Adaptive system optimization using random directions stochastic approximation

1 code implementation • 19 Feb 2015 • Prashanth L. A., Shalabh Bhatnagar, Michael Fu, Steve Marcus

We prove the unbiasedness of both gradient and Hessian estimates and asymptotic (strong) convergence for both first-order and second-order schemes.

Paper
Code

Variance-Constrained Actor-Critic Algorithms for Discounted and Average Reward MDPs

no code implementations • 25 Mar 2014 • Prashanth L. A., Mohammad Ghavamzadeh

For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize.

Decision Making

Paper
Add Code

Two Timescale Convergent Q-learning for Sleep--Scheduling in Wireless Sensor Networks

no code implementations • 27 Dec 2013 • Prashanth L. A., Abhranil Chatterjee, Shalabh Bhatnagar

For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation to handle the curse of dimensionality associated with the underlying POMDP.

feature selection Intrusion Detection +2

Paper
Add Code

Actor-Critic Algorithms for Risk-Sensitive MDPs

no code implementations • NeurIPS 2013 • Prashanth L. A., Mohammad Ghavamzadeh

For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize.

Decision Making

Paper
Add Code

Fast gradient descent for drifting least squares regression, with application to bandits

no code implementations • 11 Jul 2013 • Nathaniel Korda, Prashanth L. A., Rémi Munos

In the case when strong convexity in the regression problem is guaranteed, we provide bounds on the error both in expectation and high probability (the latter is often needed to provide theoretical guarantees for higher level algorithms), despite the drifting least squares solution.

News Recommendation regression

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.