no code implementations • 30 Jun 2024 • Siddharth Chandak, Ilai Bistritz, Nicholas Bambos

We prove that our algorithm, which is based on two time-scale stochastic approximation, guarantees convergence with probability 1 to the set of NE that meet target linear constraints.

no code implementations • 26 Jun 2024 • Siddharth Chandak, Isha Thapa, Nicholas Bambos, David Scheinker

We develop a remote patient monitoring (RPM) service architecture, which has two tiers of monitoring: ordinary and intensive.

no code implementations • 16 Dec 2023 • Siddharth Chandak, Vivek S. Borkar

We derive a concentration bound of the type `for all $n \geq n_0$ for some $n_0$' for TD(0) with linear function approximation.

no code implementations • 27 Feb 2023 • Siddharth Chandak, Ilai Bistritz, Nicholas Bambos

We prove that UECB achieves a regret of $\mathcal{O}(\log(T)+\tau_c\log(\tau_c)+\tau_c\log\log(T))$ for this equilibrium bandit problem where $\tau_c$ is the worst case approximate convergence time to equilibrium.

no code implementations • 3 Nov 2022 • Siddharth Chandak, Pratik Shah, Vivek S Borkar, Parth Dodhia

Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation.

no code implementations • 4 Nov 2021 • Siddharth Chandak, Vivek S. Borkar, Harsh Dolhare

The popular LSPE($\lambda$) algorithm for policy evaluation is revisited to derive a concentration bound that gives high probability performance guarantees from some time on.

no code implementations • 27 Jun 2021 • Siddharth Chandak, Vivek S. Borkar, Parth Dodhia

Using a martingale concentration inequality, concentration bounds `from time $n_0$ on' are derived for stochastic approximation algorithms with contractive maps and both martingale difference and Markov noises.

no code implementations • 12 Apr 2021 • Vivek S. Borkar, Siddharth Chandak

We consider a prospect theoretic version of the classical Q-learning algorithm for discounted reward Markov decision processes, wherein the controller perceives a distorted and noisy future reward, modeled by a nonlinearity that accentuates gains and underrepresents losses relative to a reference point.

no code implementations • 19 Jan 2021 • Siddharth Chandak, Federico Chiariotti, Petar Popovski

As the use of Internet of Things (IoT) devices for monitoring purposes becomes ubiquitous, the efficiency of sensor communication is a major issue for the modern Internet.

Networking and Internet Architecture 94A05 (Primary), 94B35, 62M05 (Secondary) E.4; H.1.1

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.