Search Results for author: Heyang Zhao

Found 8 papers, 0 papers with code

Feel-Good Thompson Sampling for Contextual Dueling Bandits

no code implementations9 Apr 2024 Xuheng Li, Heyang Zhao, Quanquan Gu

In this paper, we propose a Thompson sampling algorithm, named FGTS. CDB, for linear contextual dueling bandits.

Decision Making Multi-Armed Bandits +1

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

no code implementations26 Nov 2023 Heyang Zhao, Jiafan He, Quanquan Gu

The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes.

Q-Learning Reinforcement Learning (RL)

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

no code implementations2 Oct 2023 Qiwei Di, Heyang Zhao, Jiafan He, Quanquan Gu

However, limited works on offline RL with non-linear function approximation have instance-dependent regret guarantees.

Offline RL reinforcement-learning +1

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

no code implementations2 Oct 2023 Qiwei Di, Tao Jin, Yue Wu, Heyang Zhao, Farzad Farnoud, Quanquan Gu

Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems.

Computational Efficiency Decision Making +2

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

no code implementations21 Feb 2023 Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We propose a variance-adaptive algorithm for linear mixture MDPs, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret for deterministic MDPs.

Computational Efficiency Decision Making +1

Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits

no code implementations28 Feb 2022 Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu

We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise.

regression

Linear Contextual Bandits with Adversarial Corruptions

no code implementations NeurIPS 2021 Heyang Zhao, Dongruo Zhou, Quanquan Gu

We study the linear contextual bandit problem in the presence of adversarial corruption, where the interaction between the player and a possibly infinite decision set is contaminated by an adversary that can corrupt the reward up to a corruption level $C$ measured by the sum of the largest alteration on rewards in each round.

Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.