Search Results for author: Heyang Zhao

Found 8 papers, 0 papers with code

Feel-Good Thompson Sampling for Contextual Dueling Bandits

no code implementations • 9 Apr 2024 • Xuheng Li, Heyang Zhao, Quanquan Gu

In this paper, we propose a Thompson sampling algorithm, named FGTS. CDB, for linear contextual dueling bandits.

Paper
Add Code

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

no code implementations • 26 Nov 2023 • Heyang Zhao, Jiafan He, Quanquan Gu

The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

no code implementations • 2 Oct 2023 • Qiwei Di, Heyang Zhao, Jiafan He, Quanquan Gu

However, limited works on offline RL with non-linear function approximation have instance-dependent regret guarantees.

Offline RL reinforcement-learning +1

Paper
Add Code

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

no code implementations • 2 Oct 2023 • Qiwei Di, Tao Jin, Yue Wu, Heyang Zhao, Farzad Farnoud, Quanquan Gu

Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems.

Computational Efficiency Decision Making +2

Paper
Add Code

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

no code implementations • 21 Feb 2023 • Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We propose a variance-adaptive algorithm for linear mixture MDPs, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret for deterministic MDPs.

Computational Efficiency Decision Making +1

Paper
Add Code

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

no code implementations • 12 Dec 2022 • Jiafan He, Heyang Zhao, Dongruo Zhou, Quanquan Gu

We study reinforcement learning (RL) with linear function approximation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits

no code implementations • 28 Feb 2022 • Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu

We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise.

regression

Paper
Add Code

Linear Contextual Bandits with Adversarial Corruptions

no code implementations • NeurIPS 2021 • Heyang Zhao, Dongruo Zhou, Quanquan Gu

We study the linear contextual bandit problem in the presence of adversarial corruption, where the interaction between the player and a possibly infinite decision set is contaminated by an adversary that can corrupt the reward up to a corruption level $C$ measured by the sum of the largest alteration on rewards in each round.

Multi-Armed Bandits

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.