no code implementations • 19 Mar 2024 • Joe Suk, Arpit Agarwal
In dueling bandits, the learner receives preference feedback between arms, and the regret of an arm is defined in terms of its suboptimality to a winner arm.
1 code implementation • NeurIPS 2023 • Joe Suk, Arpit Agarwal
Specifically, we study the recent notion of significant shifts (Suk and Kpotufe, 2022), and ask whether one can design an adaptive algorithm for the dueling problem with $O(\sqrt{K\tilde{L}T})$ dynamic regret, where $\tilde{L}$ is the (unknown) number of significant shifts in preferences.
no code implementations • 27 Dec 2021 • Joe Suk, Samory Kpotufe
In bandit with distribution shifts, one aims to automatically adapt to unknown changes in reward distribution, and restart exploration when necessary.