no code implementations • 14 Nov 2022 • Mudit Gaur, Vaneet Aggarwal, Mridul Agarwal
Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood.
no code implementations • 13 Sep 2021 • Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, Vaneet Aggarwal
To achieve that, we advocate the use of randomized primal-dual approach to solve the CMDP problems and propose a conservative stochastic primal-dual algorithm (CSPDA) which is shown to exhibit $\tilde{\mathcal{O}}\left(1/\epsilon^2\right)$ sample complexity to achieve $\epsilon$-optimal cumulative reward with zero constraint violations.
no code implementations • 12 Sep 2021 • Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal
We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints.
no code implementations • 9 Sep 2021 • Washim Uddin Mondal, Mridul Agarwal, Vaneet Aggarwal, Satish V. Ukkusuri
We show that, in these cases, the $K$-class MARL problem can be approximated by MFC with errors given as $e_1=\mathcal{O}(\frac{\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}}{N_{\mathrm{pop}}}\sum_{k}\sqrt{N_k})$, $e_2=\mathcal{O}(\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]\sum_{k}\frac{1}{\sqrt{N_k}})$ and $e_3=\mathcal{O}\left(\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]\left[\frac{A}{N_{\mathrm{pop}}}\sum_{k\in[K]}\sqrt{N_k}+\frac{B}{\sqrt{N_{\mathrm{pop}}}}\right]\right)$, respectively, where $A, B$ are some constants and $|\mathcal{X}|,|\mathcal{U}|$ are the sizes of state and action spaces of each agent.
no code implementations • 12 Jun 2021 • Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal
We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process.
no code implementations • 28 May 2021 • Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal
Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives.
Multi-Objective Reinforcement Learning
reinforcement-learning
no code implementations • 22 Feb 2021 • Mridul Agarwal, Bhargav Ganguly, Vaneet Aggarwal
We provide \NAM\ which runs at each agent and prove that the total cumulative regret of $M$ agents is upper bounded as $\Tilde{O}(DS\sqrt{MAT})$ for a Markov Decision Process with diameter $D$, number of states $S$, and number of actions $A$.
no code implementations • 10 Feb 2021 • Mridul Agarwal, Vaneet Aggarwal, Kamyar Azizzadenesheli
With our algorithm, LCC-UCB, each agent enjoys a regret of $\tilde{O}\left(\sqrt{({K/N}+ N)T}\right)$, communicates for $O(\log T)$ steps and broadcasts $O(\log K)$ bits in each communication step.
no code implementations • 16 Nov 2020 • Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, Abhishek Umrawal
Additionally, our algorithm works on correlated rewards of individual arms.
no code implementations • 16 Nov 2020 • Mridul Agarwal, Vaneet Aggarwal
This paper proposes an approach, where the delay in the knowledge of the state can be used, and the decisions are made based on the available information which may not include the current state information.
no code implementations • 3 Oct 2019 • Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal
Gradient descent and its variants are widely used in machine learning.
no code implementations • 6 Sep 2019 • Sathwik Chadaga, Mridul Agarwal, Vaneet Aggarwal
However, large-scale design of quantum encoders and decoders have to depend on the channel characteristics and require look-up tables which require memory that is exponential in the number of qubits.
no code implementations • 6 Sep 2019 • Mridul Agarwal, Vaneet Aggarwal
Finding optimal policies which maximize long term rewards of Markov Decision Processes requires the use of dynamic programming and backward induction to solve the Bellman optimality equation.
no code implementations • 30 May 2019 • Mridul Agarwal, Vaneet Aggarwal, Arnob Ghosh, Nilay Tiwari
This paper focuses on finding a mean-field equilibrium (MFE) in an action coupled stochastic game setting in an episodic framework.
no code implementations • 29 Nov 2018 • Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, Abhishek K. Umrawal
Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant.