Search Results for author: Vivek S. Borkar

Found 15 papers, 0 papers with code

A Concentration Bound for TD(0) with Function Approximation

no code implementations • 16 Dec 2023 • Siddharth Chandak, Vivek S. Borkar

We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation.

Paper
Add Code

Approximation of Convex Envelope Using Reinforcement Learning

no code implementations • 24 Nov 2023 • Vivek S. Borkar, Adit Akarsh

Oberman gave a stochastic control formulation of the problem of estimating the convex envelope of a non-convex function.

Q-Learning reinforcement-learning

Paper
Add Code

Decentralised Q-Learning for Multi-Agent Markov Decision Processes with a Satisfiability Criterion

no code implementations • 21 Nov 2023 • Keshav P. Keval, Vivek S. Borkar

In this paper, we propose a reinforcement learning algorithm to solve a multi-agent Markov decision process (MMDP).

Q-Learning

Paper
Add Code

Actor-Critic or Critic-Actor? A Tale of Two Time Scales

no code implementations • 10 Oct 2022 • Shalabh Bhatnagar, Vivek S. Borkar, Soumyajit Guin

We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale.

Vocal Bursts Valence Prediction

Paper
Add Code

A Concentration Bound for LSPE($λ$)

no code implementations • 4 Nov 2021 • Siddharth Chandak, Vivek S. Borkar, Harsh Dolhare

The popular LSPE($\lambda$) algorithm for policy evaluation is revisited to derive a concentration bound that gives high probability performance guarantees from some time on.

Paper
Add Code

Concentration of Contractive Stochastic Approximation and Reinforcement Learning

no code implementations • 27 Jun 2021 • Siddharth Chandak, Vivek S. Borkar, Parth Dodhia

Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises.

Q-Learning reinforcement-learning +1

Paper
Add Code

Prospect-theoretic Q-learning

no code implementations • 12 Apr 2021 • Vivek S. Borkar, Siddharth Chandak

We consider a prospect theoretic version of the classical Q-learning algorithm for discounted reward Markov decision processes, wherein the controller perceives a distorted and noisy future reward, modeled by a nonlinearity that accentuates gains and underrepresents losses relative to a reference point.

Q-Learning

Paper
Add Code

Dynamic social learning under graph constraints

no code implementations • 8 Jul 2020 • Konstantin Avrachenkov, Vivek S. Borkar, Sharayu Moharir, Suhail M. Shah

We introduce a model of graph-constrained dynamic choice with reinforcement modeled by positively $\alpha$-homogeneous rewards.

Paper
Add Code

Whittle index based Q-learning for restless bandits with average reward

no code implementations • 29 Apr 2020 • Konstantin E. Avrachenkov, Vivek S. Borkar

A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index.

Q-Learning reinforcement-learning +1

Paper
Add Code

Scheduling in Wireless Networks with Spatial Reuse of Spectrum as Restless Bandits

no code implementations • 10 Oct 2019 • Vivek S. Borkar, Shantanu Choudhary, Vaibhav Kumar Gupta, Gaurav S. Kasbekar

We study the problem of scheduling packet transmissions with the aim of minimizing the energy consumption and data transmission delay of users in a wireless network in which spatial reuse of spectrum is employed.

Scheduling

Paper
Add Code

Randomized Kaczmarz for Rank Aggregation from Pairwise Comparisons

no code implementations • 9 May 2016 • Vivek S. Borkar, Nikhil Karamchandani, Sharad Mirani

We revisit the problem of inferring the overall ranking among entities in the framework of Bradley-Terry-Luce (BTL) model, based on available empirical data on pairwise preferences.

Paper
Add Code

Gradient Estimation with Simultaneous Perturbation and Compressive Sensing

no code implementations • 27 Nov 2015 • Vivek S. Borkar, Vikranth R. Dwaracherla, Neeraja Sahasrabudhe

This paper aims at achieving a "good" estimator for the gradient of a function on a high-dimensional space.

Compressive Sensing

Paper
Add Code

Empirical Q-Value Iteration

no code implementations • 30 Nov 2014 • Dileep Kalathil, Vivek S. Borkar, Rahul Jain

We propose a new simple and natural algorithm for learning the optimal Q-value function of a discounted-cost Markov Decision Process (MDP) when the transition kernels are unknown.

Q-Learning

Paper
Add Code

Reinforcement Learning for Matrix Computations: PageRank as an Example

no code implementations • 1 Nov 2013 • Vivek S. Borkar, Adwaitvedant S. Mathkar

In this spirit, we propose a reinforcement learning algorithm for PageRank computation that is fashioned after analogous schemes for approximate dynamic programming.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Distributed Reinforcement Learning via Gossip

no code implementations • 28 Oct 2013 • Adwaitvedant S. Mathkar, Vivek S. Borkar

We consider the classical TD(0) algorithm implemented on a network of agents wherein the agents also incorporate the updates received from neighboring agents using a gossip-like mechanism.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.