Search Results for author: Jiafan He

Found 24 papers, 0 papers with code

Settling Constant Regrets in Linear Markov Decision Processes

no code implementations16 Apr 2024 Weitong Zhang, Zhiyuan Fan, Jiafan He, Quanquan Gu

To the best of our knowledge, Cert-LSVI-UCB is the first algorithm to achieve a constant, instance-dependent, high-probability regret bound in RL with linear function approximation for infinite runs without relying on prior distribution assumptions.

Reinforcement Learning (RL)

Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback

no code implementations16 Apr 2024 Qiwei Di, Jiafan He, Quanquan Gu

Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM).

Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path

no code implementations14 Feb 2024 Qiwei Di, Jiafan He, Dongruo Zhou, Quanquan Gu

Our algorithm achieves an $\tilde{\mathcal O}(dB_*\sqrt{K})$ regret bound, where $d$ is the dimension of the feature mapping in the linear transition kernel, $B_*$ is the upper bound of the total cumulative cost for the optimal policy, and $K$ is the number of episodes.

Reinforcement Learning from Human Feedback with Active Queries

no code implementations14 Feb 2024 Kaixuan Ji, Jiafan He, Quanquan Gu

Aligning large language models (LLM) with human preference plays a key role in building modern generative models and can be achieved by reinforcement learning from human feedback (RLHF).

Active Learning reinforcement-learning

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

no code implementations26 Nov 2023 Heyang Zhao, Jiafan He, Quanquan Gu

The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes.

Q-Learning Reinforcement Learning (RL)

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

no code implementations2 Oct 2023 Qiwei Di, Heyang Zhao, Jiafan He, Quanquan Gu

However, limited works on offline RL with non-linear function approximation have instance-dependent regret guarantees.

Offline RL reinforcement-learning +1

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

no code implementations15 May 2023 Kaixuan Ji, Qingyue Zhao, Jiafan He, Weitong Zhang, Quanquan Gu

Recent studies have shown that episodic reinforcement learning (RL) is no harder than bandits when the total reward is bounded by $1$, and proved regret bounds that have a polylogarithmic dependence on the planning horizon $H$.

Open-Ended Question Answering reinforcement-learning +1

Uniform-PAC Guarantees for Model-Based RL with Bounded Eluder Dimension

no code implementations15 May 2023 Yue Wu, Jiafan He, Quanquan Gu

Recently, there has been remarkable progress in reinforcement learning (RL) with general function approximation.

Open-Ended Question Answering Reinforcement Learning (RL)

Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation

no code implementations10 May 2023 Yifei Min, Jiafan He, Tianhao Wang, Quanquan Gu

We study multi-agent reinforcement learning in the setting of episodic Markov decision processes, where multiple agents cooperate via communication through a central server.

Multi-agent Reinforcement Learning reinforcement-learning

On the Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits

no code implementations16 Mar 2023 Weitong Zhang, Jiafan He, Zhiyuan Fan, Quanquan Gu

We show that, when the misspecification level $\zeta$ is dominated by $\tilde O (\Delta / \sqrt{d})$ with $\Delta$ being the minimal sub-optimality gap and $d$ being the dimension of the contextual vectors, our algorithm enjoys the same gap-dependent regret bound $\tilde O (d^2/\Delta)$ as in the well-specified setting up to logarithmic factors.

Multi-Armed Bandits

Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency

no code implementations21 Feb 2023 Heyang Zhao, Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We propose a variance-adaptive algorithm for linear mixture MDPs, which achieves a problem-dependent horizon-free regret bound that can gracefully reduce to a nearly constant regret for deterministic MDPs.

Computational Efficiency Decision Making +1

A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits

no code implementations7 Jul 2022 Jiafan He, Tianhao Wang, Yifei Min, Quanquan Gu

To the best of our knowledge, this is the first provably efficient algorithm that allows fully asynchronous communication for federated contextual linear bandits, while achieving the same regret guarantee as in the single-agent setting.

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

no code implementations13 May 2022 Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

We show that for both known $C$ and unknown $C$ cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds.

Multi-Armed Bandits

Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits

no code implementations28 Feb 2022 Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu

We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise.

regression

Learning Stochastic Shortest Path with Linear Function Approximation

no code implementations25 Oct 2021 Yifei Min, Jiafan He, Tianhao Wang, Quanquan Gu

To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP.

Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

no code implementations19 Oct 2021 Chonghua Liao, Jiafan He, Quanquan Gu

To the best of our knowledge, this is the first provable privacy-preserving RL algorithm with linear function approximation.

Privacy Preserving reinforcement-learning +1

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation

no code implementations NeurIPS 2021 Jiafan He, Dongruo Zhou, Quanquan Gu

The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation.

reinforcement-learning Reinforcement Learning (RL)

Provably Efficient Representation Selection in Low-rank Markov Decision Processes: From Online to Offline RL

no code implementations22 Jun 2021 Weitong Zhang, Jiafan He, Dongruo Zhou, Amy Zhang, Quanquan Gu

For the offline counterpart, ReLEX-LCB, we show that the algorithm can find the optimal policy if the representation class can cover the state-action space and achieves gap-dependent sample complexity.

Offline RL reinforcement-learning +2

Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs

no code implementations17 Feb 2021 Jiafan He, Dongruo Zhou, Quanquan Gu

In this paper, we study RL in episodic MDPs with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature mapping, and the reward function can change arbitrarily episode by episode.

Reinforcement Learning (RL)

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

no code implementations23 Jun 2020 Dongruo Zhou, Jiafan He, Quanquan Gu

We propose a novel algorithm that makes use of the feature mapping and obtains a $\tilde O(d\sqrt{T}/(1-\gamma)^2)$ regret, where $d$ is the dimension of the feature space, $T$ is the time horizon and $\gamma$ is the discount factor of the MDP.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.