Search Results for author: Washim Uddin Mondal

Found 12 papers, 3 papers with code

Variance-Reduced Policy Gradient Approaches for Infinite Horizon Average Reward Markov Decision Processes

no code implementations2 Apr 2024 Swetha Ganesh, Washim Uddin Mondal, Vaneet Aggarwal

The second approach, rooted in Hessian-based techniques, ensures an expected regret of the order $\tilde{\mathcal{O}}(\sqrt{T})$.

Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes

no code implementations18 Oct 2023 Washim Uddin Mondal, Vaneet Aggarwal

In the class of Hessian-free and IS-free algorithms, ANPG beats the best-known sample complexity by a factor of $\mathcal{O}(\epsilon^{-\frac{1}{2}})$ and simultaneously matches their state-of-the-art iteration complexity.

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

no code implementations5 Sep 2023 Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal

Remarkably, this paper marks a pioneering effort by presenting the first exploration into regret-bound computation for the general parameterized policy gradient algorithm in the context of average reward scenarios.

Cooperating Graph Neural Networks with Deep Reinforcement Learning for Vaccine Prioritization

no code implementations9 May 2023 Lu Ling, Washim Uddin Mondal, Satish V, Ukkusuri

Then we develop a novel deep reinforcement learning to seek the optimal vaccine allocation strategy for the high-degree spatial-temporal disease evolution system.

reinforcement-learning

Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

no code implementations4 May 2023 Washim Uddin Mondal, Vaneet Aggarwal

We investigate an infinite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback.

Attribute reinforcement-learning

Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)

no code implementations15 Sep 2022 Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

In a special case where the reward, cost, and state transition functions are independent of the action distribution of the population, we prove that the error can be improved to $e=\mathcal{O}(\sqrt{|\mathcal{X}|}/\sqrt{N})$.

Multi-agent Reinforcement Learning reinforcement-learning +1

On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning

no code implementations7 Sep 2022 Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents such that the resulting discounted sum of average rewards (value) well approximates the optimal value computed over all (including non-local) policies.

Multi-agent Reinforcement Learning Reinforcement Learning (RL)

Can Mean Field Control (MFC) Approximate Cooperative Multi Agent Reinforcement Learning (MARL) with Non-Uniform Interaction?

1 code implementation28 Feb 2022 Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

We prove that, if the reward of each agent is an affine function of the mean-field seen by that agent, then one can approximate such a non-uniform MARL problem via its associated MFC problem within an error of $e=\mathcal{O}(\frac{1}{\sqrt{N}}[\sqrt{|\mathcal{X}|} + \sqrt{|\mathcal{U}|}])$ where $N$ is the population size and $|\mathcal{X}|$, $|\mathcal{U}|$ are the sizes of state and action spaces respectively.

Multi-agent Reinforcement Learning

Deep Learning based Coverage and Rate Manifold Estimation in Cellular Networks

2 code implementations13 Feb 2022 Washim Uddin Mondal, Praful D. Mankar, Goutam Das, Vaneet Aggarwal, Satish V. Ukkusuri

This article proposes Convolutional Neural Network-based Auto Encoder (CNN-AE) to predict location-dependent rate and coverage probability of a network from its topology.

On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)

no code implementations9 Sep 2021 Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, Satish V. Ukkusuri

We show that, in these cases, the $K$-class MARL problem can be approximated by MFC with errors given as $e_1=\mathcal{O}(\frac{\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$, $e_2=\mathcal{O}(\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]\sum_{k}\frac{1}{\sqrt{N_k}})$ and $e_3=\mathcal{O}\left(\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$, respectively, where $A, B$ are some constants and $|\mathcal{X}|,|\mathcal{U}|$ are the sizes of state and action spaces of each agent.

Multi-agent Reinforcement Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.