Search Results for author: Zaiwei Chen

Found 13 papers, 2 papers with code

Two-Timescale Q-Learning with Function Approximation in Zero-Sum Stochastic Games

no code implementations8 Dec 2023 Zaiwei Chen, Kaiqing Zhang, Eric Mazumdar, Asuman Ozdaglar, Adam Wierman

Specifically, through a change of variable, we show that the update equation of the slow-timescale iterates resembles the classical smoothed best-response dynamics, where the regularized Nash gap serves as a valid Lyapunov function.

Q-Learning valid

Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise

no code implementations28 Mar 2023 Zaiwei Chen, Siva Theja Maguluri, Martin Zubeldia

To demonstrate the applicability of our theoretical results, we use them to provide maximal concentration bounds for a large class of reinforcement learning algorithms, including but not limited to on-policy TD-learning with linear function approximation, off-policy TD-learning with generalized importance sampling factors, and $Q$-learning.

Q-Learning

Convergence Rates for Localized Actor-Critic in Networked Markov Potential Games

1 code implementation8 Mar 2023 Zhaoyi Zhou, Zaiwei Chen, Yiheng Lin, Adam Wierman

The algorithm is scalable since each agent uses only local information and does not need access to the global state.

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

no code implementations30 Nov 2022 Yizhou Zhang, Guannan Qu, Pan Xu, Yiheng Lin, Zaiwei Chen, Adam Wierman

In particular, we show that, despite restricting each agent's attention to only its $\kappa$-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in $\kappa$.

Multi-agent Reinforcement Learning reinforcement-learning +1

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

no code implementations5 Aug 2022 Zaiwei Chen, Siva Theja Maguluri

Combining the geometric convergence of the actor with the finite-sample analysis of the critic, we establish for the first time an overall $\mathcal{O}(\epsilon^{-2})$ sample complexity for finding an optimal policy (up to a function approximation error) using policy-based methods under off-policy sampling and linear function approximation.

Target Network and Truncation Overcome The Deadly Triad in $Q$-Learning

no code implementations5 Mar 2022 Zaiwei Chen, John Paul Clarke, Siva Theja Maguluri

$Q$-learning with function approximation is one of the most empirically successful while theoretically mysterious reinforcement learning (RL) algorithms, and was identified in Sutton (1999) as one of the most important theoretical open problems in the RL community.

Q-Learning reinforcement-learning +1

Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization

no code implementations11 Nov 2021 Zaiwei Chen, Shancong Mou, Siva Theja Maguluri

In this work, we study the asymptotic behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero.

Vocal Bursts Type Prediction

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

no code implementations NeurIPS 2021 Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

Our key step is to show that the generalized Bellman operator is simultaneously a contraction mapping with respect to a weighted $\ell_p$-norm for each $p$ in $[1,\infty)$, with a common contraction factor.

Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

no code implementations26 May 2021 Zaiwei Chen, Sajad Khodadadian, Siva Theja Maguluri

In this paper, we develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation and we establish a sample complexity of $\mathcal{O}(\epsilon^{-3})$, outperforming all the previously known convergence bounds of such algorithms.

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

no code implementations18 Feb 2021 Sajad Khodadadian, Zaiwei Chen, Siva Theja Maguluri

In this paper, we provide finite-sample convergence guarantees for an off-policy variant of the natural actor-critic (NAC) algorithm based on Importance Sampling.

A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

no code implementations2 Feb 2021 Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

As a by-product, by analyzing the convergence bounds of $n$-step TD and TD$(\lambda)$, we provide theoretical insights into the bias-variance trade-off, i. e., efficiency of bootstrapping in RL.

Q-Learning Reinforcement Learning (RL)

Finite-Sample Analysis of Nonlinear Stochastic Approximation with Applications in Reinforcement Learning

1 code implementation27 May 2019 Zaiwei Chen, Sheng Zhang, Thinh T. Doan, John-Paul Clarke, Siva Theja Maguluri

To demonstrate the generality of our theoretical results on Markovian SA, we use it to derive the finite-sample bounds of the popular $Q$-learning with linear function approximation algorithm, under a condition on the behavior policy.

Q-Learning reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.