Search Results for author: Qinbo Bai

Found 10 papers, 0 papers with code

Learning General Parameterized Policies for Infinite Horizon Average Reward Constrained MDPs via Primal-Dual Policy Gradient Algorithm

no code implementations • 3 Feb 2024 • Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal

This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDP).

Paper
Add Code

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

no code implementations • 5 Sep 2023 • Qinbo Bai, Washim Uddin Mondal, Vaneet Aggarwal

Remarkably, this paper marks a pioneering effort by presenting the first exploration into regret-bound computation for the general parameterized policy gradient algorithm in the context of average reward scenarios.

Paper
Add Code

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

no code implementations • 12 Jun 2022 • Qinbo Bai, Amrit Singh Bedi, Vaneet Aggarwal

We propose a novel Conservative Natural Policy Gradient Primal-Dual Algorithm (C-NPG-PD) to achieve zero constraint violation while achieving state of the art convergence results for the objective value function.

Paper
Add Code

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

no code implementations • 13 Sep 2021 • Qinbo Bai, Amrit Singh Bedi, Mridul Agarwal, Alec Koppel, Vaneet Aggarwal

To achieve that, we advocate the use of randomized primal-dual approach to solve the CMDP problems and propose a conservative stochastic primal-dual algorithm (CSPDA) which is shown to exhibit $\tilde{\mathcal{O}}\left(1/\epsilon^2\right)$ sample complexity to achieve $\epsilon$-optimal cumulative reward with zero constraint violations.

Decision Making reinforcement-learning +1

Paper
Add Code

Concave Utility Reinforcement Learning with Zero-Constraint Violations

no code implementations • 12 Sep 2021 • Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Markov Decision Processes with Long-Term Average Constraints

no code implementations • 12 Jun 2021 • Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process.

Paper
Add Code

Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

no code implementations • 28 May 2021 • Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal

Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives.

Multi-Objective Reinforcement Learning reinforcement-learning

Paper
Add Code

Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints

no code implementations • 10 Jun 2020 • Qinbo Bai, Vaneet Aggarwal, Ather Gattami

This paper uses concepts from constrained optimization and Q-learning to propose an algorithm for CMDP with long-term constraints.

Q-Learning

Paper
Add Code

Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints

no code implementations • 11 Mar 2020 • Qinbo Bai, Vaneet Aggarwal, Ather Gattami

The proposed algorithm is proved to achieve an $(\epsilon, p)$-PAC policy when the episode $K\geq\Omega(\frac{I^2H^6SA\ell}{\epsilon^2})$, where $S$ and $A$ are the number of states and actions, respectively.

Q-Learning Scheduling

Paper
Add Code

Escaping Saddle Points for Zeroth-order Nonconvex Optimization using Estimated Gradient Descent

no code implementations • 3 Oct 2019 • Qinbo Bai, Mridul Agarwal, Vaneet Aggarwal

Gradient descent and its variants are widely used in machine learning.

BIG-bench Machine Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.