Search Results for author: Tengyu Xu

Found 20 papers, 1 papers with code

Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward

no code implementations • 13 Jun 2022 • Tengyu Xu, Yue Wang, Shaofeng Zou, Yingbin Liang

The remarkable success of reinforcement learning (RL) heavily relies on observing the reward of every visited state-action pair.

Offline RL reinforcement-learning +1

Paper
Add Code

Model-Based Offline Meta-Reinforcement Learning with Regularization

no code implementations • ICLR 2022 • Sen Lin, Jialin Wan, Tengyu Xu, Yingbin Liang, Junshan Zhang

In particular, we devise a new meta-Regularized model-based Actor-Critic (RAC) method for within-task policy optimization, as a key building block of MerPO, using conservative policy evaluation and regularized policy improvement; and the intrinsic tradeoff therein is achieved via striking the right balance between two regularizers, one based on the behavior policy and the other on the meta-policy.

Meta Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Faster Algorithm and Sharper Analysis for Constrained Markov Decision Process

no code implementations • 20 Oct 2021 • Tianjiao Li, Ziwei Guan, Shaofeng Zou, Tengyu Xu, Yingbin Liang, Guanghui Lan

Despite the challenge of the nonconcave objective subject to nonconcave constraints, the proposed approach is shown to converge to the global optimum with a complexity of $\tilde{\mathcal O}(1/\epsilon)$ in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approach by a factor of $\mathcal O(1/\epsilon)$ \citep{ding2020natural, paternain2019constrained}.

Paper
Add Code

PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method

no code implementations • ICLR 2022 • Ziwei Guan, Tengyu Xu, Yingbin Liang

Although ETD has been shown to converge asymptotically to a desirable value function, it is well-known that ETD often encounters a large variance so that its sample complexity can increase exponentially fast with the number of iterations.

Paper
Add Code

A Unified Off-Policy Evaluation Approach for General Value Function

no code implementations • 6 Jul 2021 • Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang

We further show that unlike GTD, the learned GVFs by GenTD are guaranteed to converge to the ground truth GVFs as long as the function approximation power is sufficiently large.

Anomaly Detection Off-policy evaluation

Paper
Add Code

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

no code implementations • 23 Feb 2021 • Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang

We also show that the overall convergence of DR-Off-PAC is doubly robust to the approximation errors that depend only on the expressive power of approximation functions.

Paper
Add Code

Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry

no code implementations • ICLR 2021 • Ziyi Chen, Yi Zhou, Tengyu Xu, Yingbin Liang

By leveraging this Lyapunov function and the K{\L} geometry that parameterizes the local geometries of general nonconvex functions, we formally establish the variable convergence of proximal-GDA to a critical point $x^*$, i. e., $x_t\to x^*, y_t\to y^*(x^*)$.

Paper
Add Code

CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee

no code implementations • 11 Nov 2020 • Tengyu Xu, Yingbin Liang, Guanghui Lan

To demonstrate the theoretical performance of CRPO, we adopt natural policy gradient (NPG) for each policy update step and show that CRPO achieves an $\mathcal{O}(1/\sqrt{T})$ convergence rate to the global optimal policy in the constrained policy set and an $\mathcal{O}(1/\sqrt{T})$ error bound on constraint satisfaction.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms

no code implementations • 10 Nov 2020 • Tengyu Xu, Yingbin Liang

For linear TDC, we provide a novel non-asymptotic analysis and show that it attains an $\epsilon$-accurate solution with the optimal sample complexity of $\mathcal{O}(\epsilon^{-1}\log(1/\epsilon))$ under a constant stepsize.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

A Primal Approach to Constrained Policy Optimization: Global Optimality and Finite-Time Analysis

no code implementations • 28 Sep 2020 • Tengyu Xu, Yingbin Liang, Guanghui Lan

Safe Reinforcement Learning

Paper
Add Code

Enhanced First and Zeroth Order Variance Reduced Algorithms for Min-Max Optimization

no code implementations • 28 Sep 2020 • Tengyu Xu, Zhe Wang, Yingbin Liang, H. Vincent Poor

Specifically, a novel variance reduction algorithm SREDA was proposed recently by (Luo et al. 2020) to solve such a problem, and was shown to achieve the optimal complexity dependence on the required accuracy level $\epsilon$.

Paper
Add Code

When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence

no code implementations • 24 Jun 2020 • Ziwei Guan, Tengyu Xu, Yingbin Liang

Generative adversarial imitation learning (GAIL) is a popular inverse reinforcement learning approach for jointly optimizing policy and reward from expert trajectories.

Imitation Learning

Paper
Add Code

Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence

no code implementations • 16 Jun 2020 • Tengyu Xu, Zhe Wang, Yingbin Liang, H. Vincent Poor

In this paper, we focus on such a gradient-free setting, and consider the nonconvex-strongly-concave minimax stochastic optimization problem.

Stochastic Optimization

Paper
Add Code

Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

no code implementations • 7 May 2020 • Tengyu Xu, Zhe Wang, Yingbin Liang

In the first nested-loop design, actor's one update of policy is followed by an entire loop of critic's updates of the value function, and the finite-sample analysis of such AC and NAC algorithms have been recently well established.

Paper
Add Code

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

no code implementations • NeurIPS 2020 • Tengyu Xu, Zhe Wang, Yingbin Liang

We show that the overall sample complexity for a mini-batch AC to attain an $\epsilon$-accurate stationary point improves the best known sample complexity of AC by an order of $\mathcal{O}(\epsilon^{-1}\log(1/\epsilon))$, and the overall sample complexity for a mini-batch NAC to attain an $\epsilon$-accurate globally optimal point improves the existing sample complexity of NAC by an order of $\mathcal{O}(\epsilon^{-1}/\log(1/\epsilon))$.

Paper
Add Code

Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling

no code implementations • 15 Feb 2020 • Huaqing Xiong, Tengyu Xu, Yingbin Liang, Wei zhang

Despite the wide applications of Adam in reinforcement learning (RL), the theoretical convergence of Adam-type RL algorithms has not been established.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Reanalysis of Variance Reduced Temporal Difference Learning

no code implementations • ICLR 2020 • Tengyu Xu, Zhe Wang, Yi Zhou, Yingbin Liang

Furthermore, the variance error (for both i. i. d.\ and Markovian sampling) and the bias error (for Markovian sampling) of VRTD are significantly reduced by the batch size of variance reduction in comparison to those of vanilla TD.