Search Results for author: Chenlu Ye

Found 7 papers, 1 papers with code

Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption

no code implementations • 14 Feb 2024 • Chenlu Ye, Jiafan He, Quanquan Gu, Tong Zhang

We also prove a lower bound to show that the additive dependence on $C$ is optimal.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Online Iterative Reinforcement Learning from Human Feedback with General Preference Model

no code implementations • 11 Feb 2024 • Chenlu Ye, Wei Xiong, Yuheng Zhang, Nan Jiang, Tong Zhang

We study Reinforcement Learning from Human Feedback (RLHF) under a general preference oracle.

Paper
Add Code

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint

no code implementations • 18 Dec 2023 • Wei Xiong, Hanze Dong, Chenlu Ye, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang

This includes an iterative version of the Direct Preference Optimization (DPO) algorithm for online settings, and a multi-step rejection sampling strategy for offline scenarios.

Language Modelling Large Language Model

Paper
Add Code

Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks

no code implementations • 22 Nov 2023 • Jianqing Fan, Zhaoran Wang, Zhuoran Yang, Chenlu Ye

For these settings, we design a provably sample-efficient algorithm which achieves a $ \mathcal{\tilde O}(s_0^2 \log^2 T)$ regret in the sparse case and $ \mathcal{\tilde O} ( r ^2 \log^2 T)$ regret in the low-rank case, using only $L = \mathcal{O}( \log T)$ batches.

Multi-Armed Bandits

Paper
Add Code

Corruption-Robust Offline Reinforcement Learning with General Function Approximation

1 code implementation • NeurIPS 2023 • Chenlu Ye, Rui Yang, Quanquan Gu, Tong Zhang

Notably, under the assumption of single policy coverage and the knowledge of $\zeta$, our proposed algorithm achieves a suboptimality bound that is worsened by an additive factor of $\mathcal{O}(\zeta (C(\widehat{\mathcal{F}},\mu)n)^{-1})$ due to the corruption.

Offline RL reinforcement-learning +1

Paper
Code

Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

no code implementations • 5 Sep 2023 • Yong Lin, Chen Liu, Chenlu Ye, Qing Lian, Yuan YAO, Tong Zhang

Our proposed method, COPS (unCertainty based OPtimal Sub-sampling), is designed to minimize the expected loss of a model trained on subsampled data.

Active Learning

Paper
Add Code

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

no code implementations • 12 Dec 2022 • Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang

In this paper, we consider the contextual bandit with general function approximation and propose a computationally efficient algorithm to achieve a regret of $\tilde{O}(\sqrt{T}+\zeta)$.

Multi-Armed Bandits Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.