no code implementations • 3 Apr 2025 • Bongsoo Yi, Yue Kang, Yao Li
The Lipschitz bandit is a key variant of stochastic bandit problems where the expected reward function satisfies a Lipschitz condition with respect to an arm metric space.
no code implementations • 26 Aug 2024 • Bongsoo Yi, Yue Kang, Yao Li
The second algorithm is tailored for situations where the distribution is unknown, but only the expected value of delay is available.
no code implementations • 26 Apr 2024 • Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee
By utilizing the truncation on observed payoffs and the dynamic exploration, we propose a novel algorithm called LOTUS attaining the regret bound of order $\tilde O(d^\frac{3}{2}r^\frac{1}{2}T^\frac{1}{1+\delta}/\tilde{D}_{rr})$ without knowing $T$, which matches the state-of-the-art regret bound under sub-Gaussian noises~\citep{lu2021low, kang2022efficient} with $\delta = 1$.
no code implementations • 14 Jan 2024 • Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee
In the stochastic contextual low-rank matrix bandit problem, the expected reward of an action is given by the inner product between the action's feature matrix and some fixed, but initially unknown $d_1$ by $d_2$ matrix $\Theta^*$ with rank $r \ll \{d_1, d_2\}$, and an agent sequentially takes actions based on past experience to maximize the cumulative reward.
no code implementations • 24 Aug 2023 • Lin Yang, Junjie Chen, Shutao Gao, Zhihao Gong, Hongyu Zhang, Yue Kang, Huaan Li
This addresses the issue of unseen log events in training data, enhancing log representation.
no code implementations • 18 Feb 2023 • Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee
In stochastic contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience to minimize the cumulative regret.
1 code implementation • 22 Jun 2021 • Yizhou Wang, Yue Kang, Can Qin, Huan Wang, Yi Xu, Yulun Zhang, Yun Fu
The intuition is that gradient with momentum contains more accurate directional information and therefore its second moment estimation is a more favorable option for learning rate scaling than that of the raw gradient.
no code implementations • 5 Jun 2021 • Qin Ding, Yue Kang, Yi-Wei Liu, Thomas C. M. Lee, Cho-Jui Hsieh, James Sharpnack
To tackle this problem, we first propose a two-layer bandit structure for auto tuning the exploration parameter and further generalize it to the Syndicated Bandits framework which can learn multiple hyper-parameters dynamically in contextual bandit environment.
no code implementations • 21 Feb 2021 • Yue Kang, Dawei Leng, Jinjiang Guo, Lurong Pan
Traditional in vitro approaches use hybridoma or phage display for candidate selection, and surface plasmon resonance (SPR) for evaluation, while in silico computational approaches aim to reduce the high cost and improve efficiency by incorporating mathematical algorithms and computational processing power in the design process.