Search Results for author: Siva Theja Maguluri

Found 19 papers, 1 papers with code

Convergence for Natural Policy Gradient on Infinite-State Average-Reward Markov Decision Processes

no code implementations • 7 Feb 2024 • Isaac Grosof, Siva Theja Maguluri, R. Srikant

In the reinforcement learning (RL) context, a variety of algorithms have been developed to learn and optimize these MDPs.

Reinforcement Learning (RL)

Paper
Add Code

Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise

no code implementations • 31 Dec 2023 • Shaan ul Haque, Sajad Khodadadian, Siva Theja Maguluri

SA appears in many areas such as optimization and Reinforcement Learning (RL).

Reinforcement Learning (RL)

Paper
Add Code

Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise

no code implementations • 28 Mar 2023 • Zaiwei Chen, Siva Theja Maguluri, Martin Zubeldia

To demonstrate the applicability of our theoretical results, we use them to provide maximal concentration bounds for a large class of reinforcement learning algorithms, including but not limited to on-policy TD-learning with linear function approximation, off-policy TD-learning with generalized importance sampling factors, and $Q$-learning.

Q-Learning

Paper
Add Code

An Approximate Policy Iteration Viewpoint of Actor-Critic Algorithms

no code implementations • 5 Aug 2022 • Zaiwei Chen, Siva Theja Maguluri

Combining the geometric convergence of the actor with the finite-sample analysis of the critic, we establish for the first time an overall $\mathcal{O}(\epsilon^{-2})$ sample complexity for finding an optimal policy (up to a function approximation error) using policy-based methods under off-policy sampling and linear function approximation.

Paper
Add Code

Federated Reinforcement Learning: Linear Speedup Under Markovian Sampling

no code implementations • 21 Jun 2022 • Sajad Khodadadian, Pranay Sharma, Gauri Joshi, Siva Theja Maguluri

To obtain these results, we show that federated TD and Q-learning are special cases of a general framework for federated stochastic approximation with Markovian noise, and we leverage this framework to provide a unified convergence analysis that applies to all the algorithms.

Q-Learning reinforcement-learning +1

Paper
Add Code

Target Network and Truncation Overcome The Deadly Triad in $Q$-Learning

no code implementations • 5 Mar 2022 • Zaiwei Chen, John Paul Clarke, Siva Theja Maguluri

$Q$-learning with function approximation is one of the most empirically successful while theoretically mysterious reinforcement learning (RL) algorithms, and was identified in Sutton (1999) as one of the most important theoretical open problems in the RL community.

Q-Learning reinforcement-learning +1

Paper
Add Code

Finite Sample Analysis of Average-Reward TD Learning and $Q$-Learning

no code implementations • NeurIPS 2021 • Sheng Zhang, Zhe Zhang, Siva Theja Maguluri

The focus of this paper is on sample complexity guarantees of average-reward reinforcement learning algorithms, which are known to be more challenging to study than their discounted-reward counterparts.

Q-Learning

Paper
Add Code

Stationary Behavior of Constant Stepsize SGD Type Algorithms: An Asymptotic Characterization

no code implementations • 11 Nov 2021 • Zaiwei Chen, Shancong Mou, Siva Theja Maguluri

In this work, we study the asymptotic behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero.

Vocal Bursts Type Prediction

Paper
Add Code

Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators

no code implementations • NeurIPS 2021 • Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

Our key step is to show that the generalized Bellman operator is simultaneously a contraction mapping with respect to a weighted $\ell_p$-norm for each $p$ in $[1,\infty)$, with a common contraction factor.

Paper
Add Code

Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

no code implementations • 26 May 2021 • Zaiwei Chen, Sajad Khodadadian, Siva Theja Maguluri

In this paper, we develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation and we establish a sample complexity of $\mathcal{O}(\epsilon^{-3})$, outperforming all the previously known convergence bounds of such algorithms.

Paper
Add Code

On the Linear convergence of Natural Policy Gradient Algorithm

no code implementations • 4 May 2021 • Sajad Khodadadian, Prakirt Raj Jhunjhunwala, Sushil Mahavir Varma, Siva Theja Maguluri

We further improve this convergence result by introducing a variant of Natural Policy Gradient with adaptive step sizes.

Policy Gradient Methods reinforcement-learning +1

Paper
Add Code

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

no code implementations • 18 Feb 2021 • Sajad Khodadadian, Zaiwei Chen, Siva Theja Maguluri

In this paper, we provide finite-sample convergence guarantees for an off-policy variant of the natural actor-critic (NAC) algorithm based on Importance Sampling.

Paper
Add Code

A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

no code implementations • 2 Feb 2021 • Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

As a by-product, by analyzing the convergence bounds of $n$-step TD and TD$(\lambda)$, we provide theoretical insights into the bias-variance trade-off, i. e., efficiency of bootstrapping in RL.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Finite Sample Analysis of Two-Time-Scale Natural Actor-Critic Algorithm

no code implementations • 26 Jan 2021 • Sajad Khodadadian, Thinh T. Doan, Justin Romberg, Siva Theja Maguluri

In this paper, we characterize the \emph{global} convergence of an online natural actor-critic algorithm in the tabular setting using a single trajectory of samples.

Vocal Bursts Valence Prediction

Paper
Add Code

Near Optimal Control in Ride Hailing Platforms with Strategic Servers

no code implementations • 9 Aug 2020 • Sushil Mahavir Varma, Francisco Castro, Siva Theja Maguluri

We then study the system under a large market regime in which the arrival rates are scaled by $\eta$ and present a probabilistic two-price policy and a max-weight matching policy which results in a net profit-loss of at most $O(\eta^{1/3})$.

Optimization and Control Probability

Paper
Add Code

Finite-Sample Analysis of Stochastic Approximation Using Smooth Convex Envelopes

no code implementations • NeurIPS 2020 • Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam

In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Finite-Time Performance of Distributed Temporal Difference Learning with Linear Function Approximation

no code implementations • 25 Jul 2019 • Thinh T. Doan, Siva Theja Maguluri, Justin Romberg

Our main contribution is to provide a finite-analysis on the performance of this distributed {\sf TD}$(\lambda)$ algorithm for both constant and time-varying step sizes.

Multi-agent Reinforcement Learning

Paper
Add Code

Finite-Sample Analysis of Nonlinear Stochastic Approximation with Applications in Reinforcement Learning

1 code implementation • 27 May 2019 • Zaiwei Chen, Sheng Zhang, Thinh T. Doan, John-Paul Clarke, Siva Theja Maguluri

To demonstrate the generality of our theoretical results on Markovian SA, we use it to derive the finite-sample bounds of the popular $Q$-learning with linear function approximation algorithm, under a condition on the behavior policy.

Q-Learning reinforcement-learning +1

Paper
Code

Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation for Multi-Agent Reinforcement Learning

no code implementations • 20 Feb 2019 • Thinh T. Doan, Siva Theja Maguluri, Justin Romberg

In this problem, a group of agents works cooperatively to evaluate the value function for the global discounted accumulative reward problem, which is composed of local rewards observed by the agents.

Optimization and Control

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.