Search Results for author: Ziniu Li

Found 13 papers, 6 papers with code

Why Transformers Need Adam: A Hessian Perspective

1 code implementation26 Feb 2024 Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo

SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear.

Policy Optimization in RLHF: The Impact of Out-of-preference Data

1 code implementation17 Dec 2023 Ziniu Li, Tian Xu, Yang Yu

These methods, either explicitly or implicitly, learn a reward model from preference data and differ in the data used for policy optimization to unlock the generalization ability of the reward model.

Provably Efficient Adversarial Imitation Learning with Unknown Transitions

1 code implementation11 Jun 2023 Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo

Adversarial imitation learning (AIL), a subset of IL methods, is particularly promising, but its theoretical foundation in the presence of unknown transitions has yet to be fully developed.

Imitation Learning

Deploying Offline Reinforcement Learning with Human Feedback

no code implementations13 Mar 2023 Ziniu Li, Ke Xu, Liu Liu, Lanqing Li, Deheng Ye, Peilin Zhao

To address this issue, we propose an alternative framework that involves a human supervising the RL models and providing additional feedback in the online deployment phase.

Decision Making Model Selection +3

Theoretical Analysis of Offline Imitation With Supplementary Dataset

1 code implementation27 Jan 2023 Ziniu Li, Tian Xu, Yang Yu, Zhi-Quan Luo

This paper considers a situation where, besides the small amount of expert data, a supplementary dataset is available, which can be collected cheaply from sub-optimal policies.

Imitation Learning

A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle

no code implementations22 Mar 2022 Ziniu Li, Tian Xu, Yang Yu

In particular, we demonstrate that the sample complexity of the target Q-learning algorithm in [Lee and He, 2020] is $\widetilde{\mathcal O}(|\mathcal S|^2|\mathcal A|^2 (1-\gamma)^{-5}\varepsilon^{-2})$.

Q-Learning

Rethinking ValueDice: Does It Really Improve Performance?

no code implementations5 Feb 2022 Ziniu Li, Tian Xu, Yang Yu, Zhi-Quan Luo

First, we show that ValueDice could reduce to BC under the offline setting.

Imitation Learning

HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning

1 code implementation ICLR 2022 Ziniu Li, Yingru Li, Yushun Zhang, Tong Zhang, Zhi-Quan Luo

However, it is limited to the case where 1) a good feature is known in advance and 2) this feature is fixed during the training: if otherwise, RLSVI suffers an unbearable computational burden to obtain the posterior samples of the parameter in the $Q$-value function.

Efficient Exploration reinforcement-learning +1

On Generalization of Adversarial Imitation Learning and Beyond

no code implementations19 Jun 2021 Tian Xu, Ziniu Li, Yang Yu, Zhi-Quan Luo

For some MDPs, we show that vanilla AIL has a worse sample complexity than BC.

Imitation Learning

Error Bounds of Imitating Policies and Environments

no code implementations NeurIPS 2020 Tian Xu, Ziniu Li, Yang Yu

In this paper, we firstly analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation.

Imitation Learning Model-based Reinforcement Learning +2

On Value Discrepancy of Imitation Learning

no code implementations16 Nov 2019 Tian Xu, Ziniu Li, Yang Yu

We also show that the framework leads to the value discrepancy of GAIL in an order of O((1-\gamma)^{-1}).

Imitation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.