Search Results for author: Mridul Agarwal

Found 15 papers, 0 papers with code

On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization

no code implementations14 Nov 2022 Mudit Gaur, Vaneet Aggarwal, Mridul Agarwal

Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood.

Decision Making Q-Learning

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

no code implementations13 Sep 2021 Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, Vaneet Aggarwal

To achieve that, we advocate the use of randomized primal-dual approach to solve the CMDP problems and propose a conservative stochastic primal-dual algorithm (CSPDA) which is shown to exhibit $\tilde{\mathcal{O}}\left(1/\epsilon^2\right)$ sample complexity to achieve $\epsilon$-optimal cumulative reward with zero constraint violations.

Decision Making reinforcement-learning +1

Concave Utility Reinforcement Learning with Zero-Constraint Violations

no code implementations12 Sep 2021 Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints.

reinforcement-learning Reinforcement Learning (RL)

On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)

no code implementations9 Sep 2021 Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, Satish V. Ukkusuri

We show that, in these cases, the $K$-class MARL problem can be approximated by MFC with errors given as $e_1=\mathcal{O}(\frac{\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$, $e_2=\mathcal{O}(\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]\sum_{k}\frac{1}{\sqrt{N_k}})$ and $e_3=\mathcal{O}\left(\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$, respectively, where $A, B$ are some constants and $|\mathcal{X}|,|\mathcal{U}|$ are the sizes of state and action spaces of each agent.

Multi-agent Reinforcement Learning

Markov Decision Processes with Long-Term Average Constraints

no code implementations12 Jun 2021 Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process.

Communication Efficient Parallel Reinforcement Learning

no code implementations22 Feb 2021 Mridul Agarwal, Bhargav Ganguly, Vaneet Aggarwal

We provide \NAM\ which runs at each agent and prove that the total cumulative regret of $M$ agents is upper bounded as $\Tilde{O}(DS\sqrt{MAT})$ for a Markov Decision Process with diameter $D$, number of states $S$, and number of actions $A$.

reinforcement-learning Reinforcement Learning (RL)

Multi-Agent Multi-Armed Bandits with Limited Communication

no code implementations10 Feb 2021 Mridul Agarwal, Vaneet Aggarwal, Kamyar Azizzadenesheli

With our algorithm, LCC-UCB, each agent enjoys a regret of $\tilde{O}\left(\sqrt{({K/N}+ N)T}\right)$, communicates for $O(\log T)$ steps and broadcasts $O(\log K)$ bits in each communication step.

Multi-Armed Bandits

Blind Decision Making: Reinforcement Learning with Delayed Observations

no code implementations16 Nov 2020 Mridul Agarwal, Vaneet Aggarwal

This paper proposes an approach, where the delay in the knowledge of the state can be used, and the decisions are made based on the available information which may not include the current state information.

Decision Making reinforcement-learning +1

Encoders and Decoders for Quantum Expander Codes Using Machine Learning

no code implementations6 Sep 2019 Sathwik Chadaga, Mridul Agarwal, Vaneet Aggarwal

However, large-scale design of quantum encoders and decoders have to depend on the channel characteristics and require look-up tables which require memory that is exponential in the number of qubits.

BIG-bench Machine Learning Q-Learning

Reinforcement Learning for Joint Optimization of Multiple Rewards

no code implementations6 Sep 2019 Mridul Agarwal, Vaneet Aggarwal

Finding optimal policies which maximize long term rewards of Markov Decision Processes requires the use of dynamic programming and backward induction to solve the Bellman optimality equation.

Decision Making Fairness +3

Reinforcement Learning for Mean Field Game

no code implementations30 May 2019 Mridul Agarwal, Vaneet Aggarwal, Arnob Ghosh, Nilay Tiwari

This paper focuses on finding a mean-field equilibrium (MFE) in an action coupled stochastic game setting in an episodic framework.

reinforcement-learning Reinforcement Learning (RL)

Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback

no code implementations29 Nov 2018 Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, Abhishek K. Umrawal

Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant.

Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.