no code implementations • 9 Apr 2024 • Xuheng Li, Heyang Zhao, Quanquan Gu
In this paper, we propose a Thompson sampling algorithm, named FGTS. CDB, for linear contextual dueling bandits.
no code implementations • 26 Nov 2023 • Heyang Zhao, Jiafan He, Quanquan Gu
The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes.
no code implementations • 2 Oct 2023 • Qiwei Di, Heyang Zhao, Jiafan He, Quanquan Gu
However, limited works on offline RL with non-linear function approximation have instance-dependent regret guarantees.
no code implementations • 2 Oct 2023 • Qiwei Di, Tao Jin, Yue Wu, Heyang Zhao, Farzad Farnoud, Quanquan Gu
Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems.
no code implementations • 21 Feb 2023 • Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu
We propose a variance-adaptive algorithm for linear mixture MDPs, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret for deterministic MDPs.
no code implementations • 12 Dec 2022 • Jiafan He, Heyang Zhao, Dongruo Zhou, Quanquan Gu
We study reinforcement learning (RL) with linear function approximation.
no code implementations • 28 Feb 2022 • Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu
We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise.
no code implementations • NeurIPS 2021 • Heyang Zhao, Dongruo Zhou, Quanquan Gu
We study the linear contextual bandit problem in the presence of adversarial corruption, where the interaction between the player and a possibly infinite decision set is contaminated by an adversary that can corrupt the reward up to a corruption level $C$ measured by the sum of the largest alteration on rewards in each round.