no code implementations • 3 Mar 2023 • Jiatai Huang, Leana Golubchik, Longbo Huang
In this paper, we study scheduling of a queueing system with zero knowledge of instantaneous network conditions.
no code implementations • 25 Jan 2023 • Jiatai Huang, Yan Dai, Longbo Huang
\texttt{Banker-OMD} leads to the first delayed scale-free adversarial MAB algorithm achieving $\widetilde{\mathcal O}(\sqrt{K}L(\sqrt T+\sqrt D))$ regret and the first delayed adversarial linear bandit algorithm achieving $\widetilde{\mathcal O}(\text{poly}(n)(\sqrt{T} + \sqrt{D}))$ regret.
1 code implementation • 30 May 2022 • Yiqin Tan, Pihe Hu, Ling Pan, Jiatai Huang, Longbo Huang
Training deep reinforcement learning (DRL) models usually requires high computation costs.
no code implementations • 28 Jan 2022 • Jiatai Huang, Yan Dai, Longbo Huang
Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $\alpha$ and $\sigma$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori.
no code implementations • 26 Oct 2021 • Jiatai Huang, Yan Dai, Longbo Huang
We consider the Scale-Free Adversarial Multi-Armed Bandit (MAB) problem with unrestricted feedback delays.
no code implementations • 16 Jun 2021 • Jiatai Huang, Longbo Huang
In particular, it leads to the first delayed adversarial linear bandit algorithm achieving $\tilde{O}(\text{poly}(n)(\sqrt{T} + \sqrt{D}))$ regret.