no code implementations • 21 Aug 2024 • Washim Uddin Mondal, Vaneet Aggarwal
We consider the problem of learning a Constrained Markov Decision Process (CMDP) via general parameterization.
no code implementations • 26 Jul 2024 • Swetha Ganesh, Washim Uddin Mondal, Vaneet Aggarwal
This work analyzes average-reward reinforcement learning with general parametrization.
no code implementations • 17 Jun 2024 • Vaneet Aggarwal, Washim Uddin Mondal, Qinbo Bai
This monograph focuses on the exploration of various model-based and model-free approaches for Constrained RL within the context of average reward Markov Decision Processes (MDPs).
no code implementations • 17 May 2024 • Washim Uddin Mondal, Vaneet Aggarwal
We consider a constrained Markov Decision Problem (CMDP) where the goal of an agent is to maximize the expected discounted sum of rewards over an infinite horizon while ensuring that the expected discounted sum of costs exceeds a certain threshold.
no code implementations • 2 Apr 2024 • Swetha Ganesh, Washim Uddin Mondal, Vaneet Aggarwal
The second approach, rooted in Hessian-based techniques, ensures an expected regret of the order $\tilde{\mathcal{O}}(\sqrt{T})$.
no code implementations • 3 Feb 2024 • Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal
This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDPs).
no code implementations • 18 Oct 2023 • Washim Uddin Mondal, Vaneet Aggarwal
In the class of Hessian-free and IS-free algorithms, ANPG beats the best-known sample complexity by a factor of $\mathcal{O}(\epsilon^{-\frac{1}{2}})$ and simultaneously matches their state-of-the-art iteration complexity.
no code implementations • 5 Sep 2023 • Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal
Remarkably, this paper marks a pioneering effort by presenting the first exploration into regret-bound computation for the general parameterized policy gradient algorithm in the context of average reward scenarios.
no code implementations • 9 May 2023 • Lu Ling, Washim Uddin Mondal, Satish V, Ukkusuri
Then we develop a novel deep reinforcement learning to seek the optimal vaccine allocation strategy for the high-degree spatial-temporal disease evolution system.
no code implementations • 4 May 2023 • Washim Uddin Mondal, Vaneet Aggarwal
We investigate an infinite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback.
1 code implementation • 13 Jan 2023 • Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri
We compute the approximation error as $\mathcal{O}(e)$ where $e=\frac{1}{\sqrt{N}}\left[\sqrt{|\mathcal{X}|} +\sqrt{|\mathcal{U}|}\right]$.
no code implementations • 15 Sep 2022 • Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri
In a special case where the reward, cost, and state transition functions are independent of the action distribution of the population, we prove that the error can be improved to $e=\mathcal{O}(\sqrt{|\mathcal{X}|}/\sqrt{N})$.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
no code implementations • 7 Sep 2022 • Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri
We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents such that the resulting discounted sum of average rewards (value) well approximates the optimal value computed over all (including non-local) policies.
Multi-agent Reinforcement Learning
Reinforcement Learning (RL)
1 code implementation • 28 Feb 2022 • Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri
We prove that, if the reward of each agent is an affine function of the mean-field seen by that agent, then one can approximate such a non-uniform MARL problem via its associated MFC problem within an error of $e=\mathcal{O}(\frac{1}{\sqrt{N}}[\sqrt{|\mathcal{X}|} + \sqrt{|\mathcal{U}|}])$ where $N$ is the population size and $|\mathcal{X}|$, $|\mathcal{U}|$ are the sizes of state and action spaces respectively.
2 code implementations • 13 Feb 2022 • Washim Uddin Mondal, Praful D. Mankar, Goutam Das, Vaneet Aggarwal, Satish V. Ukkusuri
This article proposes Convolutional Neural Network-based Auto Encoder (CNN-AE) to predict location-dependent rate and coverage probability of a network from its topology.
no code implementations • 9 Sep 2021 • Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, Satish V. Ukkusuri
We show that, in these cases, the $K$-class MARL problem can be approximated by MFC with errors given as $e_1=\mathcal{O}(\frac{\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$, $e_2=\mathcal{O}(\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]\sum_{k}\frac{1}{\sqrt{N_k}})$ and $e_3=\mathcal{O}\left(\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$, respectively, where $A, B$ are some constants and $|\mathcal{X}|,|\mathcal{U}|$ are the sizes of state and action spaces of each agent.