no code implementations • 16 Dec 2023 • Siddharth Chandak, Vivek S. Borkar
We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation.
no code implementations • 24 Nov 2023 • Vivek S. Borkar, Adit Akarsh
Oberman gave a stochastic control formulation of the problem of estimating the convex envelope of a non-convex function.
no code implementations • 21 Nov 2023 • Keshav P. Keval, Vivek S. Borkar
In this paper, we propose a reinforcement learning algorithm to solve a multi-agent Markov decision process (MMDP).
no code implementations • 10 Oct 2022 • Shalabh Bhatnagar, Vivek S. Borkar, Soumyajit Guin
We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale.
no code implementations • 4 Nov 2021 • Siddharth Chandak, Vivek S. Borkar, Harsh Dolhare
The popular LSPE($\lambda$) algorithm for policy evaluation is revisited to derive a concentration bound that gives high probability performance guarantees from some time on.
no code implementations • 27 Jun 2021 • Siddharth Chandak, Vivek S. Borkar, Parth Dodhia
Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises.
no code implementations • 12 Apr 2021 • Vivek S. Borkar, Siddharth Chandak
We consider a prospect theoretic version of the classical Q-learning algorithm for discounted reward Markov decision processes, wherein the controller perceives a distorted and noisy future reward, modeled by a nonlinearity that accentuates gains and underrepresents losses relative to a reference point.
no code implementations • 8 Jul 2020 • Konstantin Avrachenkov, Vivek S. Borkar, Sharayu Moharir, Suhail M. Shah
We introduce a model of graph-constrained dynamic choice with reinforcement modeled by positively $\alpha$-homogeneous rewards.
no code implementations • 29 Apr 2020 • Konstantin E. Avrachenkov, Vivek S. Borkar
A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index.
no code implementations • 10 Oct 2019 • Vivek S. Borkar, Shantanu Choudhary, Vaibhav Kumar Gupta, Gaurav S. Kasbekar
We study the problem of scheduling packet transmissions with the aim of minimizing the energy consumption and data transmission delay of users in a wireless network in which spatial reuse of spectrum is employed.
no code implementations • 9 May 2016 • Vivek S. Borkar, Nikhil Karamchandani, Sharad Mirani
We revisit the problem of inferring the overall ranking among entities in the framework of Bradley-Terry-Luce (BTL) model, based on available empirical data on pairwise preferences.
no code implementations • 27 Nov 2015 • Vivek S. Borkar, Vikranth R. Dwaracherla, Neeraja Sahasrabudhe
This paper aims at achieving a "good" estimator for the gradient of a function on a high-dimensional space.
no code implementations • 30 Nov 2014 • Dileep Kalathil, Vivek S. Borkar, Rahul Jain
We propose a new simple and natural algorithm for learning the optimal Q-value function of a discounted-cost Markov Decision Process (MDP) when the transition kernels are unknown.
no code implementations • 1 Nov 2013 • Vivek S. Borkar, Adwaitvedant S. Mathkar
In this spirit, we propose a reinforcement learning algorithm for PageRank computation that is fashioned after analogous schemes for approximate dynamic programming.
no code implementations • 28 Oct 2013 • Adwaitvedant S. Mathkar, Vivek S. Borkar
We consider the classical TD(0) algorithm implemented on a network of agents wherein the agents also incorporate the updates received from neighboring agents using a gossip-like mechanism.