no code implementations • 7 Feb 2024 • Isaac Grosof, Siva Theja Maguluri, R. Srikant
In the reinforcement learning (RL) context, a variety of algorithms have been developed to learn and optimize these MDPs.
no code implementations • 31 Dec 2023 • Shaan ul Haque, Sajad Khodadadian, Siva Theja Maguluri
SA appears in many areas such as optimization and Reinforcement Learning (RL).
no code implementations • 28 Mar 2023 • Zaiwei Chen, Siva Theja Maguluri, Martin Zubeldia
To demonstrate the applicability of our theoretical results, we use them to provide maximal concentration bounds for a large class of reinforcement learning algorithms, including but not limited to on-policy TD-learning with linear function approximation, off-policy TD-learning with generalized importance sampling factors, and $Q$-learning.
no code implementations • 5 Aug 2022 • Zaiwei Chen, Siva Theja Maguluri
Combining the geometric convergence of the actor with the finite-sample analysis of the critic, we establish for the first time an overall $\mathcal{O}(\epsilon^{-2})$ sample complexity for finding an optimal policy (up to a function approximation error) using policy-based methods under off-policy sampling and linear function approximation.
no code implementations • 21 Jun 2022 • Sajad Khodadadian, Pranay Sharma, Gauri Joshi, Siva Theja Maguluri
To obtain these results, we show that federated TD and Q-learning are special cases of a general framework for federated stochastic approximation with Markovian noise, and we leverage this framework to provide a unified convergence analysis that applies to all the algorithms.
no code implementations • 5 Mar 2022 • Zaiwei Chen, John Paul Clarke, Siva Theja Maguluri
$Q$-learning with function approximation is one of the most empirically successful while theoretically mysterious reinforcement learning (RL) algorithms, and was identified in Sutton (1999) as one of the most important theoretical open problems in the RL community.
no code implementations • NeurIPS 2021 • Sheng Zhang, Zhe Zhang, Siva Theja Maguluri
The focus of this paper is on sample complexity guarantees of average-reward reinforcement learning algorithms, which are known to be more challenging to study than their discounted-reward counterparts.
no code implementations • 11 Nov 2021 • Zaiwei Chen, Shancong Mou, Siva Theja Maguluri
In this work, we study the asymptotic behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero.
no code implementations • NeurIPS 2021 • Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam
Our key step is to show that the generalized Bellman operator is simultaneously a contraction mapping with respect to a weighted $\ell_p$-norm for each $p$ in $[1,\infty)$, with a common contraction factor.
no code implementations • 26 May 2021 • Zaiwei Chen, Sajad Khodadadian, Siva Theja Maguluri
In this paper, we develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation and we establish a sample complexity of $\mathcal{O}(\epsilon^{-3})$, outperforming all the previously known convergence bounds of such algorithms.
no code implementations • 4 May 2021 • Sajad Khodadadian, Prakirt Raj Jhunjhunwala, Sushil Mahavir Varma, Siva Theja Maguluri
We further improve this convergence result by introducing a variant of Natural Policy Gradient with adaptive step sizes.
no code implementations • 18 Feb 2021 • Sajad Khodadadian, Zaiwei Chen, Siva Theja Maguluri
In this paper, we provide finite-sample convergence guarantees for an off-policy variant of the natural actor-critic (NAC) algorithm based on Importance Sampling.
no code implementations • 2 Feb 2021 • Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam
As a by-product, by analyzing the convergence bounds of $n$-step TD and TD$(\lambda)$, we provide theoretical insights into the bias-variance trade-off, i. e., efficiency of bootstrapping in RL.
no code implementations • 26 Jan 2021 • Sajad Khodadadian, Thinh T. Doan, Justin Romberg, Siva Theja Maguluri
In this paper, we characterize the \emph{global} convergence of an online natural actor-critic algorithm in the tabular setting using a single trajectory of samples.
no code implementations • 9 Aug 2020 • Sushil Mahavir Varma, Francisco Castro, Siva Theja Maguluri
We then study the system under a large market regime in which the arrival rates are scaled by $\eta$ and present a probabilistic two-price policy and a max-weight matching policy which results in a net profit-loss of at most $O(\eta^{1/3})$.
Optimization and Control Probability
no code implementations • NeurIPS 2020 • Zaiwei Chen, Siva Theja Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam
In particular, we use it to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-learning.
no code implementations • 25 Jul 2019 • Thinh T. Doan, Siva Theja Maguluri, Justin Romberg
Our main contribution is to provide a finite-analysis on the performance of this distributed {\sf TD}$(\lambda)$ algorithm for both constant and time-varying step sizes.
1 code implementation • 27 May 2019 • Zaiwei Chen, Sheng Zhang, Thinh T. Doan, John-Paul Clarke, Siva Theja Maguluri
To demonstrate the generality of our theoretical results on Markovian SA, we use it to derive the finite-sample bounds of the popular $Q$-learning with linear function approximation algorithm, under a condition on the behavior policy.
no code implementations • 20 Feb 2019 • Thinh T. Doan, Siva Theja Maguluri, Justin Romberg
In this problem, a group of agents works cooperatively to evaluate the value function for the global discounted accumulative reward problem, which is composed of local rewards observed by the agents.
Optimization and Control