Search Results for author: Siva Theja Maguluri

Found 19 papers, 1 papers with code

Convergence for Natural Policy Gradient on Infinite-State Average-Reward Markov Decision Processes

no code implementations7 Feb 2024 Isaac Grosof, Siva Theja Maguluri, R. Srikant

In the reinforcement learning (RL) context, a variety of algorithms have been developed to learn and optimize these MDPs.

Reinforcement Learning (RL)

Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise

no code implementations28 Mar 2023 Zaiwei Chen, Siva Theja Maguluri, Martin Zubeldia

To demonstrate the applicability of our theoretical results, we use them to provide maximal concentration bounds for a large class of reinforcement learning algorithms, including but not limited to on-policy TD-learning with linear function approximation, off-policy TD-learning with generalized importance sampling factors, and $Q$-learning.

Q-Learning

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

no code implementations5 Aug 2022 Zaiwei Chen, Siva Theja Maguluri

Combining the geometric convergence of the actor with the finite-sample analysis of the critic, we establish for the first time an overall $\mathcal{O}(\epsilon^{-2})$ sample complexity for finding an optimal policy (up to a function approximation error) using policy-based methods under off-policy sampling and linear function approximation.

Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling

no code implementations21 Jun 2022 Sajad Khodadadian, Pranay Sharma, Gauri Joshi, Siva Theja Maguluri

To obtain these results, we show that federated TD and Q-learning are special cases of a general framework for federated stochastic approximation with Markovian noise, and we leverage this framework to provide a unified convergence analysis that applies to all the algorithms.

Q-Learning reinforcement-learning +1

Target Network and Truncation Overcome The Deadly Triad in $Q$-Learning

no code implementations5 Mar 2022 Zaiwei Chen, John Paul Clarke, Siva Theja Maguluri

$Q$-learning with function approximation is one of the most empirically successful while theoretically mysterious reinforcement learning (RL) algorithms, and was identified in Sutton (1999) as one of the most important theoretical open problems in the RL community.

Q-Learning reinforcement-learning +1

Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning

no code implementations NeurIPS 2021 Sheng Zhang, Zhe Zhang, Siva Theja Maguluri

The focus of this paper is on sample complexity guarantees of average-reward reinforcement learning algorithms, which are known to be more challenging to study than their discounted-reward counterparts.

Q-Learning

Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization

no code implementations11 Nov 2021 Zaiwei Chen, Shancong Mou, Siva Theja Maguluri

In this work, we study the asymptotic behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero.

Vocal Bursts Type Prediction

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

no code implementations NeurIPS 2021 Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

Our key step is to show that the generalized Bellman operator is simultaneously a contraction mapping with respect to a weighted $\ell_p$-norm for each $p$ in $[1,\infty)$, with a common contraction factor.

Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

no code implementations26 May 2021 Zaiwei Chen, Sajad Khodadadian, Siva Theja Maguluri

In this paper, we develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation and we establish a sample complexity of $\mathcal{O}(\epsilon^{-3})$, outperforming all the previously known convergence bounds of such algorithms.

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

no code implementations18 Feb 2021 Sajad Khodadadian, Zaiwei Chen, Siva Theja Maguluri

In this paper, we provide finite-sample convergence guarantees for an off-policy variant of the natural actor-critic (NAC) algorithm based on Importance Sampling.

A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

no code implementations2 Feb 2021 Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

As a by-product, by analyzing the convergence bounds of $n$-step TD and TD$(\lambda)$, we provide theoretical insights into the bias-variance trade-off, i. e., efficiency of bootstrapping in RL.

Q-Learning Reinforcement Learning (RL)

Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm

no code implementations26 Jan 2021 Sajad Khodadadian, Thinh T. Doan, Justin Romberg, Siva Theja Maguluri

In this paper, we characterize the \emph{global} convergence of an online natural actor-critic algorithm in the tabular setting using a single trajectory of samples.

Vocal Bursts Valence Prediction

Near Optimal Control in Ride Hailing Platforms with Strategic Servers

no code implementations9 Aug 2020 Sushil Mahavir Varma, Francisco Castro, Siva Theja Maguluri

We then study the system under a large market regime in which the arrival rates are scaled by $\eta$ and present a probabilistic two-price policy and a max-weight matching policy which results in a net profit-loss of at most $O(\eta^{1/3})$.

Optimization and Control Probability

Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation

no code implementations25 Jul 2019 Thinh T. Doan, Siva Theja Maguluri, Justin Romberg

Our main contribution is to provide a finite-analysis on the performance of this distributed {\sf TD}$(\lambda)$ algorithm for both constant and time-varying step sizes.

Multi-agent Reinforcement Learning

Finite-Sample Analysis of Nonlinear Stochastic Approximation with Applications in Reinforcement Learning

1 code implementation27 May 2019 Zaiwei Chen, Sheng Zhang, Thinh T. Doan, John-Paul Clarke, Siva Theja Maguluri

To demonstrate the generality of our theoretical results on Markovian SA, we use it to derive the finite-sample bounds of the popular $Q$-learning with linear function approximation algorithm, under a condition on the behavior policy.

Q-Learning reinforcement-learning +1

Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation for Multi-Agent Reinforcement Learning

no code implementations20 Feb 2019 Thinh T. Doan, Siva Theja Maguluri, Justin Romberg

In this problem, a group of agents works cooperatively to evaluate the value function for the global discounted accumulative reward problem, which is composed of local rewards observed by the agents.

Optimization and Control

Cannot find the paper you are looking for? You can Submit a new open access paper.