Search Results for author: Shalabh Bhatnagar

Found 64 papers, 12 papers with code

Critic-Actor for Average Reward MDPs with Function Approximation: A Finite-Time Analysis

no code implementations • 2 Feb 2024 • Prashansa Panda, Shalabh Bhatnagar

In recent years, there has been a lot of research work activity focused on carrying out asymptotic and non-asymptotic convergence analyses for two-timescale actor critic algorithms where the actor updates are performed on a timescale that is slower than that of the critic.

Paper
Add Code

Approximate Linear Programming and Decentralized Policy Improvement in Cooperative Multi-agent Markov Decision Processes

no code implementations • 20 Nov 2023 • Lakshmi Mandal, Chandrashekar Lakshminarayanan, Shalabh Bhatnagar

In this work, we consider a `cooperative' multi-agent Markov decision process (MDP) involving m greater than 1 agents, where all agents are aware of the system model.

Paper
Add Code

Finite Time Analysis of Constrained Actor Critic and Constrained Natural Actor Critic Algorithms

no code implementations • 25 Oct 2023 • Prashansa Panda, Shalabh Bhatnagar

Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large.

Paper
Add Code

Energy Management in a Cooperative Energy Harvesting Wireless Sensor Network

no code implementations • 9 Oct 2023 • Arghyadeep Barat, Prabuchandran. K. J, Shalabh Bhatnagar

In this paper, we consider the problem of finding an optimal energy management policy for a network of sensor nodes capable of harvesting their own energy and sharing it with other nodes in the network.

energy management Management

Paper
Add Code

The Reinforce Policy Gradient Algorithm Revisited

no code implementations • 8 Oct 2023 • Shalabh Bhatnagar

This has advantages in the case of systems with infinite state and action spaces as it relax some of the regularity requirements that would otherwise be needed for proving convergence of the Reinforce algorithm.

Paper
Add Code

A Framework for Provably Stable and Consistent Training of Deep Feedforward Networks

no code implementations • 20 May 2023 • Arunselvan Ramaswamy, Shalabh Bhatnagar, Naman Saxena

We show, in theory and through experiments, that our algorithm updates have low variance, and the training loss reduces in a smooth manner.

Q-Learning reinforcement-learning +1

Paper
Add Code

Off-Policy Average Reward Actor-Critic with Deterministic Policy Search

no code implementations • 20 May 2023 • Naman Saxena, Subhojyoti Khastigir, Shishir Kolathaya, Shalabh Bhatnagar

In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion.

Paper
Add Code

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning

no code implementations • 21 Apr 2023 • Mizhaan Prajit Maniyar, Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

n-Step Temporal Difference Learning with Optimal n

1 code implementation • 13 Mar 2023 • Lakshmi Mandal, Shalabh Bhatnagar

We consider the problem of finding the optimal value of n in the n-step temporal difference (TD) learning algorithm.

Paper
Code

Generalized Simultaneous Perturbation-based Gradient Search with Reduced Estimator Bias

no code implementations • 20 Dec 2022 • Soumen Pachal, Shalabh Bhatnagar, L. A. Prashanth

We first present in detail unbalanced generalized simultaneous perturbation stochastic approximation (GSPSA) estimators and later present the balanced versions (B-GSPSA) of these.

Paper
Add Code

Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm

1 code implementation • 14 Oct 2022 • Ashish Kumar Jayant, Shalabh Bhatnagar

We compare our approach with relevant model-free and model-based approaches in Constrained RL using the challenging Safe Reinforcement Learning benchmark - the Open AI Safety Gym.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Code

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

no code implementations • 10 Oct 2022 • Shalabh Bhatnagar, Vivek S. Borkar, Soumyajit Guin

We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale.

Vocal Bursts Valence Prediction

Paper
Add Code

A policy gradient approach for Finite Horizon Constrained Markov Decision Processes

1 code implementation • 10 Oct 2022 • Soumyajit Guin, Shalabh Bhatnagar

In many situations, finite horizon control problems are of interest and for such problems, the optimal policies are time-varying in general.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization

no code implementations • 30 Jul 2022 • Akash Mondal, Prashanth L. A., Shalabh Bhatnagar

In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter.

Stochastic Optimization

Paper
Add Code

Reinforcement Learning for Task Specifications with Action-Constraints

no code implementations • 2 Jan 2022 • Arun Raman, Keerthan Shagrithaya, Shalabh Bhatnagar

We assume that the set of action sequences that are deemed unsafe and/or safe are given in terms of a finite-state automaton; and propose a supervisor that disables a subset of actions at every state of the MDP so that the constraints on action sequence are satisfied.

Q-Learning reinforcement-learning +1

Paper
Add Code

$N$-Timescale Stochastic Approximation: Stability and Convergence

no code implementations • 7 Dec 2021 • Rohan Deb, Shalabh Bhatnagar

This paper presents the first sufficient conditions that guarantee the stability and almost sure convergence of $N$-timescale stochastic approximation (SA) iterates for any $N\geq1$.

Paper
Add Code

Schedule Based Temporal Difference Algorithms

no code implementations • 23 Nov 2021 • Rohan Deb, Meet Gandhi, Shalabh Bhatnagar

However, the weights assigned to different $n$-step returns in TD($\lambda$), controlled by the parameter $\lambda$, decrease exponentially with increasing $n$.

Paper
Add Code

Gradient Temporal Difference with Momentum: Stability and Convergence

no code implementations • 22 Nov 2021 • Rohan Deb, Shalabh Bhatnagar

Here, we consider Gradient TD algorithms with an additional heavy ball momentum term and provide choice of step size and momentum parameter that ensures almost sure convergence of these algorithms asymptotically.

Paper
Add Code

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

no code implementations • 19 Oct 2021 • Raghuram Bharadwaj Diddigi, Prateek Jain, Prabuchandran K. J., Shalabh Bhatnagar

Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL).

Reinforcement Learning (RL)

Paper
Add Code

Attention Actor-Critic algorithm for Multi-Agent Constrained Co-operative Reinforcement Learning

2 code implementations • 7 Jan 2021 • P. Parnika, Raghuram Bharadwaj Diddigi, Sai Koti Reddy Danda, Shalabh Bhatnagar

In this work, we consider the problem of computing optimal actions for Reinforcement Learning (RL) agents in a co-operative setting, where the objective is to optimize a common goal.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach

1 code implementation • 30 Oct 2020 • Kartik Paigwar, Lokesh Krishna, Sashank Tirumala, Naman Khetan, Aditya Sagi, Ashish Joglekar, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

In particular, the parameters of the end-foot trajectories are shaped via a linear feedback policy that takes the torso orientation and the terrain slope as inputs.

Paper
Code

Hindsight Experience Replay with Kronecker Product Approximate Curvature

no code implementations • 9 Oct 2020 • Dhuruva Priyan G M, Abhik Singla, Shalabh Bhatnagar

Natural gradients solves these challenges by converging the model parameters better.

Paper
Add Code

A reinforcement learning approach to hybrid control design

no code implementations • 2 Sep 2020 • Meet Gandhi, Atreyee Kundu, Shalabh Bhatnagar

Second, we model a set of benchmark examples of hybrid control design problem in the proposed MDP framework.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning Stable Manoeuvres in Quadruped Robots from Expert Demonstrations

no code implementations • 28 Jul 2020 • Sashank Tirumala, Sagar Gubbi, Kartik Paigwar, Aditya Sagi, Ashish Joglekar, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

First, multiple simpler policies are trained to generate trajectories for a discrete set of target velocities and turning radius.

Paper
Add Code

A Stochastic Game Framework for Efficient Energy Management in Microgrid Networks

1 code implementation • 6 Feb 2020 • Shravan Nayak, Chanakya Ajit Ekbote, Annanya Pratap Singh Chauhan, Raghuram Bharadwaj Diddigi, Prishita Ray, Abhinava Sikdar, Sai Koti Reddy Danda, Shalabh Bhatnagar

A microgrid is capable of generating a limited amount of energy from a renewable resource and is responsible for handling the demands of its dedicated customers.

energy trading Management +2

Paper
Code

Hierarchical Average Reward Policy Gradient Algorithms

no code implementations • 20 Nov 2019 • Akshay Dharmavaram, Matthew Riemer, Shalabh Bhatnagar

Option-critic learning is a general-purpose reinforcement learning (RL) framework that aims to address the issue of long term credit assignment by leveraging temporal abstractions.

Reinforcement Learning (RL)

Paper
Add Code

A Convergent Off-Policy Temporal Difference Algorithm

1 code implementation • 13 Nov 2019 • Raghuram Bharadwaj Diddigi, Chandramouli Kamanchi, Shalabh Bhatnagar

In this work, we propose a convergent on-line off-policy TD algorithm under linear function approximation.

Reinforcement Learning (RL)

Paper
Code

Generalized Speedy Q-learning

1 code implementation • 1 Nov 2019 • Indu John, Chandramouli Kamanchi, Shalabh Bhatnagar

In most RL algorithms such as Q-learning, the Bellman equation and the Bellman operator play an important role.

Q-Learning Reinforcement Learning (RL)

Paper
Code

A Generalized Minimax Q-learning Algorithm for Two-Player Zero-Sum Stochastic Games

no code implementations • 16 Jun 2019 • Raghuram Bharadwaj Diddigi, Chandramouli Kamanchi, Shalabh Bhatnagar

This problem is formulated as a min-max Markov game in the literature.

Q-Learning

Paper
Add Code

Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robots

no code implementations • 15 May 2019 • Shounak Bhattacharya, Abhik Singla, Abhimanyu, Dhaivat Dholakiya, Shalabh Bhatnagar, Bharadwaj Amrutur, Ashitava Ghosal, Shishir Kolathaya

In this work, we provide a simulation framework to perform systematic studies on the effects of spinal joint compliance and actuation on bounding performance of a 16-DOF quadruped spined robot Stoch 2.

Paper
Add Code

Reinforcement Learning in Non-Stationary Environments

no code implementations • 10 May 2019 • Sindhu Padakandla, Prabuchandran K. J, Shalabh Bhatnagar

In this paper, we thus consider the problem of developing RL methods that obtain optimal decisions in a non-stationary environment.

energy management Management +2

Paper
Add Code

Generalized Second Order Value Iteration in Markov Decision Processes

2 code implementations • 10 May 2019 • Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar

In this work, we propose a second order value iteration procedure that is obtained by applying the Newton-Raphson method to the successive relaxation value iteration scheme.

Paper
Code

Successive Over Relaxation Q-Learning

no code implementations • 9 Mar 2019 • Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar

We first derive a modified fixed point iteration for SOR Q-values and utilize stochastic approximation to derive a learning algorithm to compute the optimal value function and an optimal policy.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

An Online Sample Based Method for Mode Estimation using ODE Analysis of Stochastic Approximation Algorithms

no code implementations • 11 Feb 2019 • Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Prabuchandran K. J., Shalabh Bhatnagar

In many of the practical applications, the analytical form of the density is not known and only the samples from the distribution are available.

Paper
Add Code

Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge

2 code implementations • 8 Nov 2018 • Abhik Singla, Sindhu Padakandla, Shalabh Bhatnagar

When compared to obstacle avoidance in ground vehicular robots, UAV navigation brings in additional challenges because the UAV motion is no more constrained to a well-defined indoor ground or street environment.

Decision Making Reinforcement Learning (RL)

Paper
Code

Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives

no code implementations • 9 Oct 2018 • Abhik Singla, Shounak Bhattacharya, Dhaivat Dholakiya, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

Leveraging on this underlying structure, we then realize walking in Stoch by a straightforward reconstruction of joint trajectories from kMPs.

Paper
Add Code

Random directions stochastic approximation with deterministic perturbations

1 code implementation • 8 Aug 2018 • Prashanth L. A, Shalabh Bhatnagar, Nirav Bhavsar, Michael Fu, Steven I. Marcus

We introduce deterministic perturbation schemes for the recently proposed random directions stochastic approximation (RDSA) [17], and propose new first-order and second-order algorithms.

Paper
Code

An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method

no code implementations • 15 Jun 2018 • Ajin George Joseph, Shalabh Bhatnagar

In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i. e.}, estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set.

Computational Efficiency Reinforcement Learning (RL)

Paper
Add Code

Asynchronous stochastic approximations with asymptotically biased errors and deep multi-agent learning

no code implementations • 22 Feb 2018 • Arunselvan Ramaswamy, Shalabh Bhatnagar, Daniel E. Quevedo

In this paper, we present verifiable sufficient conditions for stability and convergence of asynchronous SAs with biased approximation errors.

Multi-agent Reinforcement Learning Policy Gradient Methods

Paper
Add Code

A Cross Entropy based Optimization Algorithm with Global Convergence Guarantees

no code implementations • 31 Jan 2018 • Ajin George Joseph, Shalabh Bhatnagar

The cross entropy (CE) method is a model based search method to solve optimization problems where the objective function has minimal structure.

Paper
Add Code

An Incremental Off-policy Search in a Model-free Markov Decision Process Using a Single Sample Path

no code implementations • 31 Jan 2018 • Ajin George Joseph, Shalabh Bhatnagar

In this paper, we consider a modified version of the control problem in a model free Markov decision process (MDP) setting with large state and action spaces.

Paper
Add Code

A unified decision making framework for supply and demand management in microgrid networks

no code implementations • 14 Nov 2017 • Diddigi Raghuram Bharadwaj, Sai Koti Reddy Danda, Krishnasuri Narayanam, Shalabh Bhatnagar

This paper considers two important problems -- on the supply-side and demand-side respectively and studies both in a unified framework.

Decision Making Management +2

Paper
Add Code

Analyzing Approximate Value Iteration Algorithms

no code implementations • 14 Sep 2017 • Arunselvan Ramaswamy, Shalabh Bhatnagar

In this paper, we consider the stochastic iterative counterpart of the value iteration scheme wherein only noisy and possibly biased approximations of the Bellman operator are available.

Paper
Add Code

Novel Sensor Scheduling Scheme for Intruder Tracking in Energy Efficient Sensor Networks

no code implementations • 27 Aug 2017 • Raghuram Bharadwaj Diddigi, Prabuchandran K. J., Shalabh Bhatnagar

We consider the problem of tracking an intruder using a network of wireless sensors.

Intrusion Detection Reinforcement Learning (RL) +1

Paper
Add Code

Multi-Agent Q-Learning for Minimizing Demand-Supply Power Deficit in Microgrids

no code implementations • 25 Aug 2017 • Raghuram Bharadwaj Diddigi, D. Sai Koti Reddy, Shalabh Bhatnagar

Finally, we also consider a variant of this problem where the cost of power production at the main site is taken into consideration.

Q-Learning

Paper
Add Code

On the function approximation error for risk-sensitive reinforcement learning

no code implementations • 22 Dec 2016 • Prasenjit Karmakar, Shalabh Bhatnagar

The novelty of our approach is that we use the irreduciblity of Markov chain to get the new bounds whereas the earlier work by Basu et al. used spectral variation bound which is true for any matrix.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach

no code implementations • 30 Nov 2016 • Sandeep Kumar, Sindhu Padakandla, Chandrashekar L, Priyank Parihar, K Gopinath, Shalabh Bhatnagar

Our method, when tested on a 25 node Hadoop cluster shows 66\% decrease in execution time of Hadoop jobs on an average, when compared to the default configuration.

Paper
Add Code

On a convergent off -policy temporal difference learning algorithm in on-line learning environment

no code implementations • 19 May 2016 • Prasenjit Karmakar, Rajkumar Maity, Shalabh Bhatnagar

In this paper we provide a rigorous convergence analysis of a "off"-policy temporal difference learning algorithm with linear function approximation and per time-step linear computational complexity in "online" learning environment.

Paper
Add Code

Analysis of gradient descent methods with non-diminishing, bounded errors

no code implementations • 1 Apr 2016 • Arunselvan Ramaswamy, Shalabh Bhatnagar

The main aim of this paper is to provide an analysis of gradient descent (GD) algorithms with gradient errors that do not necessarily vanish, asymptotically.

Paper
Add Code

Shaping Proto-Value Functions via Rewards

no code implementations • 27 Nov 2015 • Chandrashekar Lakshmi Narayanan, Raj Kumar Maity, Shalabh Bhatnagar

In this paper, we combine task-dependent reward shaping and task-independent proto-value functions to obtain reward dependent proto-value functions (RPVFs).

Paper
Add Code

A constrained optimization perspective on actor critic algorithms and application to network routing

no code implementations • 28 Jul 2015 • Prashanth L. A., H. L. Prasad, Shalabh Bhatnagar, Prakash Chandra

We propose a novel actor-critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process.

Paper
Add Code

A Study of Gradient Descent Schemes for General-Sum Stochastic Games

no code implementations • 1 Jul 2015 • H. L. Prasad, Shalabh Bhatnagar

However, the optimization problem there has a non-linear objective and non-linear constraints with special structure.

Paper
Add Code

Stability of Stochastic Approximations with `Controlled Markov' Noise and Temporal Difference Learning

no code implementations • 23 Apr 2015 • Arunselvan Ramaswamy, Shalabh Bhatnagar

Analyzing this class of algorithms is important, since many reinforcement learning (RL) algorithms can be cast as SAs driven by a `controlled Markov' process.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

no code implementations • 31 Mar 2015 • Prasenjit Karmakar, Shalabh Bhatnagar

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise.

Paper
Add Code

Energy Sharing for Multiple Sensor Nodes with Finite Buffers

no code implementations • 17 Mar 2015 • Sindhu Padakandla, Prabuchandran K. J, Shalabh Bhatnagar

We also develop a cross entropy based method that incorporates policy parameterization in order to find near optimal energy sharing policies.

Q-Learning

Paper
Add Code

Adaptive system optimization using random directions stochastic approximation

1 code implementation • 19 Feb 2015 • Prashanth L. A., Shalabh Bhatnagar, Michael Fu, Steve Marcus

We prove the unbiasedness of both gradient and Hessian estimates and asymptotic (strong) convergence for both first-order and second-order schemes.

Paper
Code

A Generalization of the Borkar-Meyn Theorem for Stochastic Recursive Inclusions

no code implementations • 6 Feb 2015 • Arunselvan Ramaswamy, Shalabh Bhatnagar

In this paper the stability theorem of Borkar and Meyn is extended to include the case when the mean field is a differential inclusion.

Paper
Add Code

Stochastic recursive inclusion in two timescales with an application to the Lagrangian dual problem

no code implementations • 6 Feb 2015 • Arunselvan Ramaswamy, Shalabh Bhatnagar

In this paper we present a framework to analyze the asymptotic behavior of two timescale stochastic approximation algorithms including those with set-valued mean fields.

Paper
Add Code

Universal Option Models

no code implementations • NeurIPS 2014 • Hengshuai Yao, Csaba Szepesvari, Richard S. Sutton, Joseph Modayil, Shalabh Bhatnagar

We prove that the UOM of an option can construct a traditional option model given a reward function, and the option-conditional return is computed directly by a single dot-product of the UOM with the reward function.

Paper
Add Code

Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games

no code implementations • 8 Jan 2014 • H. L. Prasad, L. A. Prashanth, Shalabh Bhatnagar

We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game - Sub-Problem) conditions.

Paper
Add Code

Two Timescale Convergent Q-learning for Sleep--Scheduling in Wireless Sensor Networks

no code implementations • 27 Dec 2013 • Prashanth L. A., Abhranil Chatterjee, Shalabh Bhatnagar

For each criterion, we propose a convergent on-policy Q-learning algorithm that operates on two timescales, while employing function approximation to handle the curse of dimensionality associated with the underlying POMDP.

feature selection Intrusion Detection +2

Paper
Add Code

Smoothed Functional Algorithms for Stochastic Optimization using q-Gaussian Distributions

no code implementations • 21 Jun 2012 • Debarghya Ghoshdastidar, Ambedkar Dukkipati, Shalabh Bhatnagar

This motivates us to study SF schemes for gradient estimation using the q-Gaussian distribution.

Stochastic Optimization

Paper
Add Code

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

no code implementations • NeurIPS 2009 • Shalabh Bhatnagar, Doina Precup, David Silver, Richard S. Sutton, Hamid R. Maei, Csaba Szepesvári

We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks.

Q-Learning

Paper
Add Code

Multi-Step Dyna Planning for Policy Evaluation and Control

no code implementations • NeurIPS 2009 • Hengshuai Yao, Shalabh Bhatnagar, Dongcui Diao, Richard S. Sutton, Csaba Szepesvári

We extend Dyna planning architecture for policy evaluation and control in two significant aspects.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.