no code implementations • 7 Apr 2023 • Tejas Pagare, Vivek Borkar, Konstantin Avrachenkov
We extend the provably convergent Full Gradient DQN algorithm for discounted reward Markov decision processes from Avrachenkov et al. (2021) to average reward problems.
no code implementations • 9 Oct 2022 • Harsh Dolhare, Vivek Borkar
We revisit the classical model of Tsitsiklis, Bertsekas and Athans for distributed stochastic approximation with consensus.
no code implementations • 7 Jun 2022 • Shaan ul Haque, Vivek Borkar
We derive a concentration bound for a Q-learning algorithm for average cost Markov decision processes based on an equivalent shortest path problem, and compare it numerically with the alternative scheme based on relative value iteration.
no code implementations • 27 Oct 2021 • Vivek Borkar, Shuhang Chen, Adithya Devraj, Ioannis Kontoyiannis, Sean Meyn
In addition to standard Lipschitz assumptions and conditions on the vanishing step-size sequence, it is assumed that the associated \textit{mean flow} $ \tfrac{d}{dt} \vartheta_t = \bar{f}(\vartheta_t)$, is globally asymptotically stable with stationary point denoted $\theta^*$, where $\bar{f}(\theta)=\text{ E}[f(\theta,\Phi)]$ with $\Phi$ having the stationary distribution of the chain.
no code implementations • 15 Feb 2021 • Priyadarshini K, Siddhartha Chaudhuri, Vivek Borkar, Subhasis Chaudhuri
To avoid redundancy between triplets, our method collectively selects batches with maximum joint entropy, which simultaneously captures both informativeness and diversity.
no code implementations • 21 Dec 2019 • Arghyadip Roy, Vivek Borkar, Abhay Karandikar, Prasanna Chaporkar
To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice.
no code implementations • 19 Oct 2019 • Vivek Borkar, Alexandre Reiffers-Masson
One leads to a convex optimization problem and the other to a non-convex one.
no code implementations • 28 Nov 2018 • Arghyadip Roy, Vivek Borkar, Abhay Karandikar, Prasanna Chaporkar
In this paper, we propose a new RL algorithm which utilizes the known threshold structure of the optimal policy while learning by reducing the feasible policy space.
no code implementations • 4 Sep 2015 • Konstantin Avrachenkov, Vivek Borkar, Krishnakant Saboo
It is based onsampling of nodes by performing a random walk on the graph.
no code implementations • 3 Nov 2014 • Dileep Kalathil, Vivek Borkar, Rahul Jain
Firstly, we give a simple and computationally tractable strategy for approachability for Stackelberg stochastic games along the lines of Blackwell's.