Search Results for author: Jiatai Huang

Found 6 papers, 1 papers with code

Queue Scheduling with Adversarial Bandit Learning

no code implementations • 3 Mar 2023 • Jiatai Huang, Leana Golubchik, Longbo Huang

In this paper, we study scheduling of a queueing system with zero knowledge of instantaneous network conditions.

Paper
Add Code

Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning

no code implementations • 25 Jan 2023 • Jiatai Huang, Yan Dai, Longbo Huang

\texttt{Banker-OMD} leads to the first delayed scale-free adversarial MAB algorithm achieving $\widetilde{\mathcal O}(\sqrt{K}L(\sqrt T+\sqrt D))$ regret and the first delayed adversarial linear bandit algorithm achieving $\widetilde{\mathcal O}(\text{poly}(n)(\sqrt{T} + \sqrt{D}))$ regret.

Multi-Armed Bandits

Paper
Add Code

RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch

1 code implementation • 30 May 2022 • Yiqin Tan, Pihe Hu, Ling Pan, Jiatai Huang, Longbo Huang

Training deep reinforcement learning (DRL) models usually requires high computation costs.

Continuous Control Knowledge Distillation +3

Paper
Code

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

no code implementations • 28 Jan 2022 • Jiatai Huang, Yan Dai, Longbo Huang

Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $\alpha$ and $\sigma$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori.

Multi-Armed Bandits

Paper
Add Code

Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

no code implementations • 26 Oct 2021 • Jiatai Huang, Yan Dai, Longbo Huang

We consider the Scale-Free Adversarial Multi-Armed Bandit (MAB) problem with unrestricted feedback delays.

Paper
Add Code

Banker Online Mirror Descent

no code implementations • 16 Jun 2021 • Jiatai Huang, Longbo Huang

In particular, it leads to the first delayed adversarial linear bandit algorithm achieving $\tilde{O}(\text{poly}(n)(\sqrt{T} + \sqrt{D}))$ regret.

Multi-Armed Bandits

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.