no code implementations • 17 Jun 2024 • Vaneet Aggarwal, Washim Uddin Mondal, Qinbo Bai
This monograph focuses on the exploration of various model-based and model-free approaches for Constrained RL within the context of average reward Markov Decision Processes (MDPs).
no code implementations • 3 Feb 2024 • Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal
This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDPs).
no code implementations • 5 Sep 2023 • Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal
Remarkably, this paper marks a pioneering effort by presenting the first exploration into regret-bound computation for the general parameterized policy gradient algorithm in the context of average reward scenarios.
no code implementations • 12 Jun 2022 • Qinbo Bai, Amrit Singh Bedi, Vaneet Aggarwal
We propose a novel Conservative Natural Policy Gradient Primal-Dual Algorithm (C-NPG-PD) to achieve zero constraint violation while achieving state of the art convergence results for the objective value function.
no code implementations • 13 Sep 2021 • Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, Vaneet Aggarwal
To achieve that, we advocate the use of randomized primal-dual approach to solve the CMDP problems and propose a conservative stochastic primal-dual algorithm (CSPDA) which is shown to exhibit $\tilde{\mathcal{O}}\left(1/\epsilon^2\right)$ sample complexity to achieve $\epsilon$-optimal cumulative reward with zero constraint violations.
no code implementations • 12 Sep 2021 • Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal
We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints.
no code implementations • 12 Jun 2021 • Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal
We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process.
no code implementations • 28 May 2021 • Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal
Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives.
Multi-Objective Reinforcement Learning
reinforcement-learning
no code implementations • 10 Jun 2020 • Qinbo Bai, Vaneet Aggarwal, Ather Gattami
This paper uses concepts from constrained optimization and Q-learning to propose an algorithm for CMDP with long-term constraints.
no code implementations • 11 Mar 2020 • Qinbo Bai, Vaneet Aggarwal, Ather Gattami
The proposed algorithm is proved to achieve an $(\epsilon, p)$-PAC policy when the episode $K\geq\Omega(\frac{I^2H^6SA\ell}{\epsilon^2})$, where $S$ and $A$ are the number of states and actions, respectively.
no code implementations • 3 Oct 2019 • Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal
Gradient descent and its variants are widely used in machine learning.