Search Results for author: Qinbo Bai

Found 11 papers, 0 papers with code

Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithms

no code implementations17 Jun 2024 Vaneet Aggarwal, Washim Uddin Mondal, Qinbo Bai

This monograph focuses on the exploration of various model-based and model-free approaches for Constrained RL within the context of average reward Markov Decision Processes (MDPs).

Autonomous Driving Decision Making +4

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

no code implementations5 Sep 2023 Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal

Remarkably, this paper marks a pioneering effort by presenting the first exploration into regret-bound computation for the general parameterized policy gradient algorithm in the context of average reward scenarios.

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

no code implementations12 Jun 2022 Qinbo Bai, Amrit Singh Bedi, Vaneet Aggarwal

We propose a novel Conservative Natural Policy Gradient Primal-Dual Algorithm (C-NPG-PD) to achieve zero constraint violation while achieving state of the art convergence results for the objective value function.

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

no code implementations13 Sep 2021 Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, Vaneet Aggarwal

To achieve that, we advocate the use of randomized primal-dual approach to solve the CMDP problems and propose a conservative stochastic primal-dual algorithm (CSPDA) which is shown to exhibit $\tilde{\mathcal{O}}\left(1/\epsilon^2\right)$ sample complexity to achieve $\epsilon$-optimal cumulative reward with zero constraint violations.

Decision Making reinforcement-learning +1

Concave Utility Reinforcement Learning with Zero-Constraint Violations

no code implementations12 Sep 2021 Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints.

reinforcement-learning Reinforcement Learning +1

Markov Decision Processes with Long-Term Average Constraints

no code implementations12 Jun 2021 Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process.

Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints

no code implementations10 Jun 2020 Qinbo Bai, Vaneet Aggarwal, Ather Gattami

This paper uses concepts from constrained optimization and Q-learning to propose an algorithm for CMDP with long-term constraints.

Q-Learning

Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints

no code implementations11 Mar 2020 Qinbo Bai, Vaneet Aggarwal, Ather Gattami

The proposed algorithm is proved to achieve an $(\epsilon, p)$-PAC policy when the episode $K\geq\Omega(\frac{I^2H^6SA\ell}{\epsilon^2})$, where $S$ and $A$ are the number of states and actions, respectively.

Q-Learning Scheduling

Cannot find the paper you are looking for? You can Submit a new open access paper.