Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

5 Sep 2019Yichun HuNathan KallusXiaojie Mao

We study a nonparametric contextual bandit problem where the expected reward functions belong to a H\"older class with smoothness parameter $\beta$. We show how this interpolates between two extremes that were previously studied in isolation: non-differentiable bandits ($\beta\leq1$), where rate-optimal regret is achieved by running separate non-contextual bandits in different context regions, and parametric-response bandits ($\beta=\infty$), where rate-optimal regret can be achieved with minimal or no exploration due to infinite extrapolatability... (read more)

PDF Abstract


No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.