Search Results for author: Chenlu Ye

Found 7 papers, 3 papers with code

Online Iterative Reinforcement Learning from Human Feedback with General Preference Model

1 code implementation11 Feb 2024 Chenlu Ye, Wei Xiong, Yuheng Zhang, Nan Jiang, Tong Zhang

We study Reinforcement Learning from Human Feedback (RLHF) under a general preference oracle.

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint

3 code implementations18 Dec 2023 Wei Xiong, Hanze Dong, Chenlu Ye, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang

We investigate its behavior in three distinct settings -- offline, online, and hybrid -- and propose efficient algorithms with finite-sample theoretical guarantees.

Language Modelling Large Language Model

Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks

no code implementations22 Nov 2023 Jianqing Fan, Zhaoran Wang, Zhuoran Yang, Chenlu Ye

For these settings, we design a provably sample-efficient algorithm which achieves a $ \mathcal{\tilde O}(s_0^2 \log^2 T)$ regret in the sparse case and $ \mathcal{\tilde O} ( r ^2 \log^2 T)$ regret in the low-rank case, using only $L = \mathcal{O}( \log T)$ batches.

Multi-Armed Bandits

Corruption-Robust Offline Reinforcement Learning with General Function Approximation

1 code implementation NeurIPS 2023 Chenlu Ye, Rui Yang, Quanquan Gu, Tong Zhang

Notably, under the assumption of single policy coverage and the knowledge of $\zeta$, our proposed algorithm achieves a suboptimality bound that is worsened by an additive factor of $\mathcal{O}(\zeta (C(\widehat{\mathcal{F}},\mu)n)^{-1})$ due to the corruption.

Offline RL reinforcement-learning +2

Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning

no code implementations5 Sep 2023 Yong Lin, Chen Liu, Chenlu Ye, Qing Lian, Yuan YAO, Tong Zhang

Our proposed method, COPS (unCertainty based OPtimal Sub-sampling), is designed to minimize the expected loss of a model trained on subsampled data.

Active Learning

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

no code implementations12 Dec 2022 Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang

In this paper, we consider the contextual bandit with general function approximation and propose a computationally efficient algorithm to achieve a regret of $\tilde{O}(\sqrt{T}+\zeta)$.

Multi-Armed Bandits Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.