Search Results for author: Vivek S. Borkar

Found 15 papers, 0 papers with code

A Concentration Bound for TD(0) with Function Approximation

no code implementations16 Dec 2023 Siddharth Chandak, Vivek S. Borkar

We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation.

Approximation of Convex Envelope Using Reinforcement Learning

no code implementations24 Nov 2023 Vivek S. Borkar, Adit Akarsh

Oberman gave a stochastic control formulation of the problem of estimating the convex envelope of a non-convex function.

Q-Learning reinforcement-learning

Decentralised Q-Learning for Multi-Agent Markov Decision Processes with a Satisfiability Criterion

no code implementations21 Nov 2023 Keshav P. Keval, Vivek S. Borkar

In this paper, we propose a reinforcement learning algorithm to solve a multi-agent Markov decision process (MMDP).

Q-Learning

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

no code implementations10 Oct 2022 Shalabh Bhatnagar, Vivek S. Borkar, Soumyajit Guin

We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale.

Vocal Bursts Valence Prediction

A Concentration Bound for LSPE($λ$)

no code implementations4 Nov 2021 Siddharth Chandak, Vivek S. Borkar, Harsh Dolhare

The popular LSPE($\lambda$) algorithm for policy evaluation is revisited to derive a concentration bound that gives high probability performance guarantees from some time on.

Concentration of Contractive Stochastic Approximation and Reinforcement Learning

no code implementations27 Jun 2021 Siddharth Chandak, Vivek S. Borkar, Parth Dodhia

Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises.

Q-Learning reinforcement-learning +1

Prospect-theoretic Q-learning

no code implementations12 Apr 2021 Vivek S. Borkar, Siddharth Chandak

We consider a prospect theoretic version of the classical Q-learning algorithm for discounted reward Markov decision processes, wherein the controller perceives a distorted and noisy future reward, modeled by a nonlinearity that accentuates gains and underrepresents losses relative to a reference point.

Q-Learning

Dynamic social learning under graph constraints

no code implementations8 Jul 2020 Konstantin Avrachenkov, Vivek S. Borkar, Sharayu Moharir, Suhail M. Shah

We introduce a model of graph-constrained dynamic choice with reinforcement modeled by positively $\alpha$-homogeneous rewards.

Whittle index based Q-learning for restless bandits with average reward

no code implementations29 Apr 2020 Konstantin E. Avrachenkov, Vivek S. Borkar

A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index.

Q-Learning reinforcement-learning +1

Scheduling in Wireless Networks with Spatial Reuse of Spectrum as Restless Bandits

no code implementations10 Oct 2019 Vivek S. Borkar, Shantanu Choudhary, Vaibhav Kumar Gupta, Gaurav S. Kasbekar

We study the problem of scheduling packet transmissions with the aim of minimizing the energy consumption and data transmission delay of users in a wireless network in which spatial reuse of spectrum is employed.

Scheduling

Randomized Kaczmarz for Rank Aggregation from Pairwise Comparisons

no code implementations9 May 2016 Vivek S. Borkar, Nikhil Karamchandani, Sharad Mirani

We revisit the problem of inferring the overall ranking among entities in the framework of Bradley-Terry-Luce (BTL) model, based on available empirical data on pairwise preferences.

Gradient Estimation with Simultaneous Perturbation and Compressive Sensing

no code implementations27 Nov 2015 Vivek S. Borkar, Vikranth R. Dwaracherla, Neeraja Sahasrabudhe

This paper aims at achieving a "good" estimator for the gradient of a function on a high-dimensional space.

Compressive Sensing

Empirical Q-Value Iteration

no code implementations30 Nov 2014 Dileep Kalathil, Vivek S. Borkar, Rahul Jain

We propose a new simple and natural algorithm for learning the optimal Q-value function of a discounted-cost Markov Decision Process (MDP) when the transition kernels are unknown.

Q-Learning

Reinforcement Learning for Matrix Computations: PageRank as an Example

no code implementations1 Nov 2013 Vivek S. Borkar, Adwaitvedant S. Mathkar

In this spirit, we propose a reinforcement learning algorithm for PageRank computation that is fashioned after analogous schemes for approximate dynamic programming.

reinforcement-learning Reinforcement Learning (RL)

Distributed Reinforcement Learning via Gossip

no code implementations28 Oct 2013 Adwaitvedant S. Mathkar, Vivek S. Borkar

We consider the classical TD(0) algorithm implemented on a network of agents wherein the agents also incorporate the updates received from neighboring agents using a gossip-like mechanism.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.