no code implementations • 5 Jun 2021 • Qin Ding, Yue Kang, Yi-Wei Liu, Thomas C. M. Lee, Cho-Jui Hsieh, James Sharpnack
To tackle this problem, we first propose a two-layer bandit structure for auto tuning the exploration parameter and further generalize it to the Syndicated Bandits framework which can learn multiple hyper-parameters dynamically in contextual bandit environment.