Search Results for author: Shalabh Bhatnagar

Found 64 papers, 12 papers with code

Critic-Actor for Average Reward MDPs with Function Approximation: A Finite-Time Analysis

no code implementations2 Feb 2024 Prashansa Panda, Shalabh Bhatnagar

In recent years, there has been a lot of research work activity focused on carrying out asymptotic and non-asymptotic convergence analyses for two-timescale actor critic algorithms where the actor updates are performed on a timescale that is slower than that of the critic.

Approximate Linear Programming and Decentralized Policy Improvement in Cooperative Multi-agent Markov Decision Processes

no code implementations20 Nov 2023 Lakshmi Mandal, Chandrashekar Lakshminarayanan, Shalabh Bhatnagar

In this work, we consider a `cooperative' multi-agent Markov decision process (MDP) involving m greater than 1 agents, where all agents are aware of the system model.

Finite Time Analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms

no code implementations25 Oct 2023 Prashansa Panda, Shalabh Bhatnagar

Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large.

Energy Management in a Cooperative Energy Harvesting Wireless Sensor Network

no code implementations9 Oct 2023 Arghyadeep Barat, Prabuchandran. K. J, Shalabh Bhatnagar

In this paper, we consider the problem of finding an optimal energy management policy for a network of sensor nodes capable of harvesting their own energy and sharing it with other nodes in the network.

energy management Management

The Reinforce Policy Gradient Algorithm Revisited

no code implementations8 Oct 2023 Shalabh Bhatnagar

This has advantages in the case of systems with infinite state and action spaces as it relax some of the regularity requirements that would otherwise be needed for proving convergence of the Reinforce algorithm.

A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks

no code implementations20 May 2023 Arunselvan Ramaswamy, Shalabh Bhatnagar, Naman Saxena

We show, in theory and through experiments, that our algorithm updates have low variance, and the training loss reduces in a smooth manner.

Q-Learning reinforcement-learning +1

Off-Policy Average Reward Actor-Critic with Deterministic Policy Search

no code implementations20 May 2023 Naman Saxena, Subhojyoti Khastigir, Shishir Kolathaya, Shalabh Bhatnagar

In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion.

n-Step Temporal Difference Learning with Optimal n

1 code implementation13 Mar 2023 Lakshmi Mandal, Shalabh Bhatnagar

We consider the problem of finding the optimal value of n in the n-step temporal difference (TD) learning algorithm.

Generalized Simultaneous Perturbation-based Gradient Search with Reduced Estimator Bias

no code implementations20 Dec 2022 Soumen Pachal, Shalabh Bhatnagar, L. A. Prashanth

We first present in detail unbalanced generalized simultaneous perturbation stochastic approximation (GSPSA) estimators and later present the balanced versions (B-GSPSA) of these.

Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm

1 code implementation14 Oct 2022 Ashish Kumar Jayant, Shalabh Bhatnagar

We compare our approach with relevant model-free and model-based approaches in Constrained RL using the challenging Safe Reinforcement Learning benchmark - the Open AI Safety Gym.

reinforcement-learning Reinforcement Learning (RL) +2

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

no code implementations10 Oct 2022 Shalabh Bhatnagar, Vivek S. Borkar, Soumyajit Guin

We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale.

Vocal Bursts Valence Prediction

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

1 code implementation10 Oct 2022 Soumyajit Guin, Shalabh Bhatnagar

In many situations, finite horizon control problems are of interest and for such problems, the optimal policies are time-varying in general.

reinforcement-learning Reinforcement Learning (RL)

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

no code implementations30 Jul 2022 Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter.

Stochastic Optimization

Reinforcement Learning for Task Specifications with Action-Constraints

no code implementations2 Jan 2022 Arun Raman, Keerthan Shagrithaya, Shalabh Bhatnagar

We assume that the set of action sequences that are deemed unsafe and/or safe are given in terms of a finite-state automaton; and propose a supervisor that disables a subset of actions at every state of the MDP so that the constraints on action sequence are satisfied.

Q-Learning reinforcement-learning +1

$N$-Timescale Stochastic Approximation: Stability and Convergence

no code implementations7 Dec 2021 Rohan Deb, Shalabh Bhatnagar

This paper presents the first sufficient conditions that guarantee the stability and almost sure convergence of $N$-timescale stochastic approximation (SA) iterates for any $N\geq1$.

Schedule Based Temporal Difference Algorithms

no code implementations23 Nov 2021 Rohan Deb, Meet Gandhi, Shalabh Bhatnagar

However, the weights assigned to different $n$-step returns in TD($\lambda$), controlled by the parameter $\lambda$, decrease exponentially with increasing $n$.

Gradient Temporal Difference with Momentum: Stability and Convergence

no code implementations22 Nov 2021 Rohan Deb, Shalabh Bhatnagar

Here, we consider Gradient TD algorithms with an additional heavy ball momentum term and provide choice of step size and momentum parameter that ensures almost sure convergence of these algorithms asymptotically.

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

no code implementations19 Oct 2021 Raghuram Bharadwaj Diddigi, Prateek Jain, Prabuchandran K. J., Shalabh Bhatnagar

Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL).

Reinforcement Learning (RL)

Attention Actor-Critic algorithm for Multi-Agent Constrained Co-operative Reinforcement Learning

2 code implementations7 Jan 2021 P. Parnika, Raghuram Bharadwaj Diddigi, Sai Koti Reddy Danda, Shalabh Bhatnagar

In this work, we consider the problem of computing optimal actions for Reinforcement Learning (RL) agents in a co-operative setting, where the objective is to optimize a common goal.

reinforcement-learning Reinforcement Learning (RL)

Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach

1 code implementation30 Oct 2020 Kartik Paigwar, Lokesh Krishna, Sashank Tirumala, Naman Khetan, Aditya Sagi, Ashish Joglekar, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

In particular, the parameters of the end-foot trajectories are shaped via a linear feedback policy that takes the torso orientation and the terrain slope as inputs.

Hindsight Experience Replay with Kronecker Product Approximate Curvature

no code implementations9 Oct 2020 Dhuruva Priyan G M, Abhik Singla, Shalabh Bhatnagar

Natural gradients solves these challenges by converging the model parameters better.

A reinforcement learning approach to hybrid control design

no code implementations2 Sep 2020 Meet Gandhi, Atreyee Kundu, Shalabh Bhatnagar

Second, we model a set of benchmark examples of hybrid control design problem in the proposed MDP framework.

reinforcement-learning Reinforcement Learning (RL)

Learning Stable Manoeuvres in Quadruped Robots from Expert Demonstrations

no code implementations28 Jul 2020 Sashank Tirumala, Sagar Gubbi, Kartik Paigwar, Aditya Sagi, Ashish Joglekar, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

First, multiple simpler policies are trained to generate trajectories for a discrete set of target velocities and turning radius.

Hierarchical Average Reward Policy Gradient Algorithms

no code implementations20 Nov 2019 Akshay Dharmavaram, Matthew Riemer, Shalabh Bhatnagar

Option-critic learning is a general-purpose reinforcement learning (RL) framework that aims to address the issue of long term credit assignment by leveraging temporal abstractions.

Reinforcement Learning (RL)

A Convergent Off-Policy Temporal Difference Algorithm

1 code implementation13 Nov 2019 Raghuram Bharadwaj Diddigi, Chandramouli Kamanchi, Shalabh Bhatnagar

In this work, we propose a convergent on-line off-policy TD algorithm under linear function approximation.

Reinforcement Learning (RL)

Generalized Speedy Q-learning

1 code implementation1 Nov 2019 Indu John, Chandramouli Kamanchi, Shalabh Bhatnagar

In most RL algorithms such as Q-learning, the Bellman equation and the Bellman operator play an important role.

Q-Learning Reinforcement Learning (RL)

Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots

no code implementations15 May 2019 Shounak Bhattacharya, Abhik Singla, Abhimanyu, Dhaivat Dholakiya, Shalabh Bhatnagar, Bharadwaj Amrutur, Ashitava Ghosal, Shishir Kolathaya

In this work, we provide a simulation framework to perform systematic studies on the effects of spinal joint compliance and actuation on bounding performance of a 16-DOF quadruped spined robot Stoch 2.

Reinforcement Learning in Non-Stationary Environments

no code implementations10 May 2019 Sindhu Padakandla, Prabuchandran K. J, Shalabh Bhatnagar

In this paper, we thus consider the problem of developing RL methods that obtain optimal decisions in a non-stationary environment.

energy management Management +2

Generalized Second Order Value Iteration in Markov Decision Processes

2 code implementations10 May 2019 Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar

In this work, we propose a second order value iteration procedure that is obtained by applying the Newton-Raphson method to the successive relaxation value iteration scheme.

Successive Over Relaxation Q-Learning

no code implementations9 Mar 2019 Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar

We first derive a modified fixed point iteration for SOR Q-values and utilize stochastic approximation to derive a learning algorithm to compute the optimal value function and an optimal policy.

Q-Learning Reinforcement Learning (RL)

An Online Sample Based Method for Mode Estimation using ODE Analysis of Stochastic Approximation Algorithms

no code implementations11 Feb 2019 Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Prabuchandran K. J., Shalabh Bhatnagar

In many of the practical applications, the analytical form of the density is not known and only the samples from the distribution are available.

Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge

2 code implementations8 Nov 2018 Abhik Singla, Sindhu Padakandla, Shalabh Bhatnagar

When compared to obstacle avoidance in ground vehicular robots, UAV navigation brings in additional challenges because the UAV motion is no more constrained to a well-defined indoor ground or street environment.

Decision Making Reinforcement Learning (RL)

Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives

no code implementations9 Oct 2018 Abhik Singla, Shounak Bhattacharya, Dhaivat Dholakiya, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

Leveraging on this underlying structure, we then realize walking in Stoch by a straightforward reconstruction of joint trajectories from kMPs.

Random directions stochastic approximation with deterministic perturbations

1 code implementation8 Aug 2018 Prashanth L. A, Shalabh Bhatnagar, Nirav Bhavsar, Michael Fu, Steven I. Marcus

We introduce deterministic perturbation schemes for the recently proposed random directions stochastic approximation (RDSA) [17], and propose new first-order and second-order algorithms.

An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method

no code implementations15 Jun 2018 Ajin George Joseph, Shalabh Bhatnagar

In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i. e.}, estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set.

Computational Efficiency Reinforcement Learning (RL)

A Cross Entropy based Optimization Algorithm with Global Convergence Guarantees

no code implementations31 Jan 2018 Ajin George Joseph, Shalabh Bhatnagar

The cross entropy (CE) method is a model based search method to solve optimization problems where the objective function has minimal structure.

An Incremental Off-policy Search in a Model-free Markov Decision Process Using a Single Sample Path

no code implementations31 Jan 2018 Ajin George Joseph, Shalabh Bhatnagar

In this paper, we consider a modified version of the control problem in a model free Markov decision process (MDP) setting with large state and action spaces.

Analyzing Approximate Value Iteration Algorithms

no code implementations14 Sep 2017 Arunselvan Ramaswamy, Shalabh Bhatnagar

In this paper, we consider the stochastic iterative counterpart of the value iteration scheme wherein only noisy and possibly biased approximations of the Bellman operator are available.

Multi-Agent Q-Learning for Minimizing Demand-Supply Power Deficit in Microgrids

no code implementations25 Aug 2017 Raghuram Bharadwaj Diddigi, D. Sai Koti Reddy, Shalabh Bhatnagar

Finally, we also consider a variant of this problem where the cost of power production at the main site is taken into consideration.

Q-Learning

On the function approximation error for risk-sensitive reinforcement learning

no code implementations22 Dec 2016 Prasenjit Karmakar, Shalabh Bhatnagar

The novelty of our approach is that we use the irreduciblity of Markov chain to get the new bounds whereas the earlier work by Basu et al. used spectral variation bound which is true for any matrix.

reinforcement-learning Reinforcement Learning (RL)

Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach

no code implementations30 Nov 2016 Sandeep Kumar, Sindhu Padakandla, Chandrashekar L, Priyank Parihar, K Gopinath, Shalabh Bhatnagar

Our method, when tested on a 25 node Hadoop cluster shows 66\% decrease in execution time of Hadoop jobs on an average, when compared to the default configuration.

On a convergent off -policy temporal difference learning algorithm in on-line learning environment

no code implementations19 May 2016 Prasenjit Karmakar, Rajkumar Maity, Shalabh Bhatnagar

In this paper we provide a rigorous convergence analysis of a "off"-policy temporal difference learning algorithm with linear function approximation and per time-step linear computational complexity in "online" learning environment.

Analysis of gradient descent methods with non-diminishing, bounded errors

no code implementations1 Apr 2016 Arunselvan Ramaswamy, Shalabh Bhatnagar

The main aim of this paper is to provide an analysis of gradient descent (GD) algorithms with gradient errors that do not necessarily vanish, asymptotically.

Shaping Proto-Value Functions via Rewards

no code implementations27 Nov 2015 Chandrashekar Lakshmi Narayanan, Raj Kumar Maity, Shalabh Bhatnagar

In this paper, we combine task-dependent reward shaping and task-independent proto-value functions to obtain reward dependent proto-value functions (RPVFs).

A constrained optimization perspective on actor critic algorithms and application to network routing

no code implementations28 Jul 2015 Prashanth L. A., H. L. Prasad, Shalabh Bhatnagar, Prakash Chandra

We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process.

A Study of Gradient Descent Schemes for General-Sum Stochastic Games

no code implementations1 Jul 2015 H. L. Prasad, Shalabh Bhatnagar

However, the optimization problem there has a non-linear objective and non-linear constraints with special structure.

Stability of Stochastic Approximations with `Controlled Markov' Noise and Temporal Difference Learning

no code implementations23 Apr 2015 Arunselvan Ramaswamy, Shalabh Bhatnagar

Analyzing this class of algorithms is important, since many reinforcement learning (RL) algorithms can be cast as SAs driven by a `controlled Markov' process.

reinforcement-learning Reinforcement Learning (RL)

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

no code implementations31 Mar 2015 Prasenjit Karmakar, Shalabh Bhatnagar

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise.

Energy Sharing for Multiple Sensor Nodes with Finite Buffers

no code implementations17 Mar 2015 Sindhu Padakandla, Prabuchandran K. J, Shalabh Bhatnagar

We also develop a cross entropy based method that incorporates policy parameterization in order to find near optimal energy sharing policies.

Q-Learning

Adaptive system optimization using random directions stochastic approximation

1 code implementation19 Feb 2015 Prashanth L. A., Shalabh Bhatnagar, Michael Fu, Steve Marcus

We prove the unbiasedness of both gradient and Hessian estimates and asymptotic (strong) convergence for both first-order and second-order schemes.

A Generalization of the Borkar-Meyn Theorem for Stochastic Recursive Inclusions

no code implementations6 Feb 2015 Arunselvan Ramaswamy, Shalabh Bhatnagar

In this paper the stability theorem of Borkar and Meyn is extended to include the case when the mean field is a differential inclusion.

Stochastic recursive inclusion in two timescales with an application to the Lagrangian dual problem

no code implementations6 Feb 2015 Arunselvan Ramaswamy, Shalabh Bhatnagar

In this paper we present a framework to analyze the asymptotic behavior of two timescale stochastic approximation algorithms including those with set-valued mean fields.

Universal Option Models

no code implementations NeurIPS 2014 Hengshuai Yao, Csaba Szepesvari, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar

We prove that the UOM of an option can construct a traditional option model given a reward function, and the option-conditional return is computed directly by a single dot-product of the UOM with the reward function.

Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games

no code implementations8 Jan 2014 H. L. Prasad, L. A. Prashanth, Shalabh Bhatnagar

We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game - Sub-Problem) conditions.

Two Timescale Convergent Q-learning for Sleep--Scheduling in Wireless Sensor Networks

no code implementations27 Dec 2013 Prashanth L. A., Abhranil Chatterjee, Shalabh Bhatnagar

For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation to handle the curse of dimensionality associated with the underlying POMDP.

feature selection Intrusion Detection +2

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

no code implementations NeurIPS 2009 Shalabh Bhatnagar, Doina Precup, David Silver, Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári

We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks.

Q-Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.