Search Results for author: Mridul Agarwal

Found 15 papers, 0 papers with code

On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization

no code implementations • 14 Nov 2022 • Mudit Gaur, Vaneet Aggarwal, Mridul Agarwal

Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood.

Decision Making Q-Learning

Paper
Add Code

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

no code implementations • 13 Sep 2021 • Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, Vaneet Aggarwal

To achieve that, we advocate the use of randomized primal-dual approach to solve the CMDP problems and propose a conservative stochastic primal-dual algorithm (CSPDA) which is shown to exhibit $\tilde{\mathcal{O}}\left(1/\epsilon^2\right)$ sample complexity to achieve $\epsilon$-optimal cumulative reward with zero constraint violations.

Decision Making reinforcement-learning +1

Paper
Add Code

Concave Utility Reinforcement Learning with Zero-Constraint Violations

no code implementations • 12 Sep 2021 • Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

On the Approximation of Cooperative Heterogeneous Multi-Agent Reinforcement Learning (MARL) using Mean Field Control (MFC)

no code implementations • 9 Sep 2021 • Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, Satish V. Ukkusuri

We show that, in these cases, the $K$-class MARL problem can be approximated by MFC with errors given as $e_1=\mathcal{O}(\frac{\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$, $e_2=\mathcal{O}(\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]\sum_{k}\frac{1}{\sqrt{N_k}})$ and $e_3=\mathcal{O}\left(\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$, respectively, where $A, B$ are some constants and $|\mathcal{X}|,|\mathcal{U}|$ are the sizes of state and action spaces of each agent.

Multi-agent Reinforcement Learning

Paper
Add Code

Markov Decision Processes with Long-Term Average Constraints

no code implementations • 12 Jun 2021 • Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process.

Paper
Add Code

Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

no code implementations • 28 May 2021 • Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal

Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives.

Multi-Objective Reinforcement Learning reinforcement-learning

Paper
Add Code

Communication Efficient Parallel Reinforcement Learning

no code implementations • 22 Feb 2021 • Mridul Agarwal, Bhargav Ganguly, Vaneet Aggarwal

We provide \NAM\ which runs at each agent and prove that the total cumulative regret of $M$ agents is upper bounded as $\Tilde{O}(DS\sqrt{MAT})$ for a Markov Decision Process with diameter $D$, number of states $S$, and number of actions $A$.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Multi-Agent Multi-Armed Bandits with Limited Communication

no code implementations • 10 Feb 2021 • Mridul Agarwal, Vaneet Aggarwal, Kamyar Azizzadenesheli

With our algorithm, LCC-UCB, each agent enjoys a regret of $\tilde{O}\left(\sqrt{({K/N}+ N)T}\right)$, communicates for $O(\log T)$ steps and broadcasts $O(\log K)$ bits in each communication step.

Multi-Armed Bandits

Paper
Add Code

Blind Decision Making: Reinforcement Learning with Delayed Observations

no code implementations • 16 Nov 2020 • Mridul Agarwal, Vaneet Aggarwal

This paper proposes an approach, where the delay in the knowledge of the state can be used, and the decisions are made based on the available information which may not include the current state information.

Decision Making reinforcement-learning +1

Paper
Add Code

DART: aDaptive Accept RejecT for non-linear top-K subset identification

no code implementations • 16 Nov 2020 • Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, Abhishek Umrawal

Additionally, our algorithm works on correlated rewards of individual arms.

Paper
Add Code

Escaping Saddle Points for Zeroth-order Nonconvex Optimization using Estimated Gradient Descent

no code implementations • 3 Oct 2019 • Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal

Gradient descent and its variants are widely used in machine learning.

BIG-bench Machine Learning

Paper
Add Code

Encoders and Decoders for Quantum Expander Codes Using Machine Learning

no code implementations • 6 Sep 2019 • Sathwik Chadaga, Mridul Agarwal, Vaneet Aggarwal

However, large-scale design of quantum encoders and decoders have to depend on the channel characteristics and require look-up tables which require memory that is exponential in the number of qubits.

BIG-bench Machine Learning Q-Learning

Paper
Add Code

Reinforcement Learning for Joint Optimization of Multiple Rewards

no code implementations • 6 Sep 2019 • Mridul Agarwal, Vaneet Aggarwal

Finding optimal policies which maximize long term rewards of Markov Decision Processes requires the use of dynamic programming and backward induction to solve the Bellman optimality equation.

Decision Making Fairness +3

Paper
Add Code

Reinforcement Learning for Mean Field Game

no code implementations • 30 May 2019 • Mridul Agarwal, Vaneet Aggarwal, Arnob Ghosh, Nilay Tiwari

This paper focuses on finding a mean-field equilibrium (MFE) in an action coupled stochastic game setting in an episodic framework.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback

no code implementations • 29 Nov 2018 • Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, Abhishek K. Umrawal

Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant.

Multi-Armed Bandits

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.