no code implementations • 4 Jun 2024 • Francisco Robledo Relaño, Vivek Borkar, Urtzi Ayesta, Konstantin Avrachenkov
The Whittle index policy is a heuristic that has shown remarkably good performance (with guaranteed asymptotic optimality) when applied to the class of problems known as Restless Multi-Armed Bandit Problems (RMABPs).
no code implementations • 22 May 2024 • Shivam Patel, Vivek Borkar
Risk sensitive decision making finds important applications in current day use cases.
no code implementations • 7 Apr 2023 • Tejas Pagare, Vivek Borkar, Konstantin Avrachenkov
We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems.
no code implementations • 9 Oct 2022 • Harsh Dolhare, Vivek Borkar
We revisit the classical model of Tsitsiklis, Bertsekas and Athans for distributed stochastic approximation with consensus.
no code implementations • 7 Jun 2022 • Shaan ul Haque, Vivek Borkar
We derive a concentration bound for a Q-learning algorithm for average cost Markov decision processes based on an equivalent shortest path problem, and compare it numerically with the alternative scheme based on relative value iteration.
no code implementations • 27 Oct 2021 • Vivek Borkar, Shuhang Chen, Adithya Devraj, Ioannis Kontoyiannis, Sean Meyn
The paper concerns the $d$-dimensional stochastic approximation recursion, $$ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) $$ where $ \{ \Phi_n \}$ is a stochastic process on a general state space, satisfying a conditional Markov property that allows for parameter-dependent noise.
no code implementations • 15 Feb 2021 • Priyadarshini K, Siddhartha Chaudhuri, Vivek Borkar, Subhasis Chaudhuri
To avoid redundancy between triplets, our method collectively selects batches with maximum joint entropy, which simultaneously captures both informativeness and diversity.
no code implementations • 21 Dec 2019 • Arghyadip Roy, Vivek Borkar, Abhay Karandikar, Prasanna Chaporkar
To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice.
no code implementations • 19 Oct 2019 • Vivek Borkar, Alexandre Reiffers-Masson
One leads to a convex optimization problem and the other to a non-convex one.
no code implementations • 28 Nov 2018 • Arghyadip Roy, Vivek Borkar, Abhay Karandikar, Prasanna Chaporkar
In this paper, we propose a new RL algorithm which utilizes the known threshold structure of the optimal policy while learning by reducing the feasible policy space.
no code implementations • 4 Sep 2015 • Konstantin Avrachenkov, Vivek Borkar, Krishnakant Saboo
It is based onsampling of nodes by performing a random walk on the graph.
no code implementations • 3 Nov 2014 • Dileep Kalathil, Vivek Borkar, Rahul Jain
Firstly, we give a simple and computationally tractable strategy for approachability for Stackelberg stochastic games along the lines of Blackwell's.