Search Results for author: Washim Uddin Mondal

Found 12 papers, 3 papers with code

Variance-Reduced Policy Gradient Approaches for Infinite Horizon Average Reward Markov Decision Processes

no code implementations • 2 Apr 2024 • Swetha Ganesh, Washim Uddin Mondal, Vaneet Aggarwal

The second approach, rooted in Hessian-based techniques, ensures an expected regret of the order $\tilde{\mathcal{O}}(\sqrt{T})$.

Paper
Add Code

Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm

no code implementations • 3 Feb 2024 • Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal

This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDP).

Paper
Add Code

Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes

no code implementations • 18 Oct 2023 • Washim Uddin Mondal, Vaneet Aggarwal

In the class of Hessian-free and IS-free algorithms, ANPG beats the best-known sample complexity by a factor of $\mathcal{O}(\epsilon^{-\frac{1}{2}})$ and simultaneously matches their state-of-the-art iteration complexity.

Paper
Add Code

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

no code implementations • 5 Sep 2023 • Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal

Remarkably, this paper marks a pioneering effort by presenting the first exploration into regret-bound computation for the general parameterized policy gradient algorithm in the context of average reward scenarios.

Paper
Add Code

Cooperating Graph Neural Networks with Deep Reinforcement Learning for Vaccine Prioritization

no code implementations • 9 May 2023 • Lu Ling, Washim Uddin Mondal, Satish V, Ukkusuri

Then we develop a novel deep reinforcement learning to seek the optimal vaccine allocation strategy for the high-degree spatial-temporal disease evolution system.

reinforcement-learning

Paper
Add Code

Reinforcement Learning with Delayed, Composite, and Partially Anonymous Reward

no code implementations • 4 May 2023 • Washim Uddin Mondal, Vaneet Aggarwal

We investigate an infinite-horizon average reward Markov Decision Process (MDP) with delayed, composite, and partially anonymous reward feedback.

Attribute reinforcement-learning

Paper
Add Code

Mean-Field Control based Approximation of Multi-Agent Reinforcement Learning in Presence of a Non-decomposable Shared Global State

1 code implementation • 13 Jan 2023 • Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

We compute the approximation error as $\mathcal{O}(e)$ where $e=\frac{1}{\sqrt{N}}\left[\sqrt{|\mathcal{X}|} +\sqrt{|\mathcal{U}|}\right]$.

Multi-agent Reinforcement Learning

Paper
Code

Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL)

no code implementations • 15 Sep 2022 • Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

In a special case where the reward, cost, and state transition functions are independent of the action distribution of the population, we prove that the error can be improved to $e=\mathcal{O}(\sqrt{|\mathcal{X}|}/\sqrt{N})$.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning

no code implementations • 7 Sep 2022 • Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents such that the resulting discounted sum of average rewards (value) well approximates the optimal value computed over all (including non-local) policies.

Multi-agent Reinforcement Learning Reinforcement Learning (RL)

Paper
Add Code

Can Mean Field Control (MFC) Approximate Cooperative Multi Agent Reinforcement Learning (MARL) with Non-Uniform Interaction?

1 code implementation • 28 Feb 2022 • Washim Uddin Mondal, Vaneet Aggarwal, Satish V. Ukkusuri

We prove that, if the reward of each agent is an affine function of the mean-field seen by that agent, then one can approximate such a non-uniform MARL problem via its associated MFC problem within an error of $e=\mathcal{O}(\frac{1}{\sqrt{N}}[\sqrt{|\mathcal{X}|} + \sqrt{|\mathcal{U}|}])$ where $N$ is the population size and $|\mathcal{X}|$, $|\mathcal{U}|$ are the sizes of state and action spaces respectively.

Multi-agent Reinforcement Learning

Paper
Code

Deep Learning based Coverage and Rate Manifold Estimation in Cellular Networks

2 code implementations • 13 Feb 2022 • Washim Uddin Mondal, Praful D. Mankar, Goutam Das, Vaneet Aggarwal, Satish V. Ukkusuri

This article proposes Convolutional Neural Network-based Auto Encoder (CNN-AE) to predict location-dependent rate and coverage probability of a network from its topology.

Paper
Code

On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)

no code implementations • 9 Sep 2021 • Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, Satish V. Ukkusuri

We show that, in these cases, the $K$-class MARL problem can be approximated by MFC with errors given as $e_1=\mathcal{O}(\frac{\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$, $e_2=\mathcal{O}(\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]\sum_{k}\frac{1}{\sqrt{N_k}})$ and $e_3=\mathcal{O}\left(\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$, respectively, where $A, B$ are some constants and $|\mathcal{X}|,|\mathcal{U}|$ are the sizes of state and action spaces of each agent.

Multi-agent Reinforcement Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.