no code implementations • 5 Feb 2025 • Hamid Eghbalzadeh, Yang Wang, Rui Li, Yuji Mo, Qin Ding, Jiaxiang Fu, Liang Dai, Shuo Gu, Nima Noorshams, Sem Park, Bo Long, Xue Feng
Industrial ads ranking systems conventionally rely on labeled impression data, which leads to challenges such as overfitting, slower incremental gain from model scaling, and biases due to discrepancies between training and serving data.
no code implementations • 5 Jun 2021 • Qin Ding, Yue Kang, Yi-Wei Liu, Thomas C. M. Lee, Cho-Jui Hsieh, James Sharpnack
To tackle this problem, we first propose a two-layer bandit structure for auto tuning the exploration parameter and further generalize it to the Syndicated Bandits framework which can learn multiple hyper-parameters dynamically in contextual bandit environment.
no code implementations • 5 Jun 2021 • Qin Ding, Cho-Jui Hsieh, James Sharpnack
We provide theoretical guarantees for our proposed algorithm and show by experiments that our proposed algorithm improves the robustness against various kinds of popular attacks.
no code implementations • 7 Jun 2020 • Qin Ding, Cho-Jui Hsieh, James Sharpnack
A natural way to resolve this problem is to apply online stochastic gradient descent (SGD) so that the per-step time and memory complexity can be reduced to constant with respect to $t$, but a contextual bandit policy based on online SGD updates that balances exploration and exploitation has remained elusive.
no code implementations • 13 Feb 2020 • Qin Ding, Cho-Jui Hsieh, James Sharpnack
Classic contextual bandit algorithms for linear models, such as LinUCB, assume that the reward distribution for an arm is modeled by a stationary linear regression.