# Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes

5 Sep 2019Yichun HuNathan KallusXiaojie Mao

We study a nonparametric contextual bandit problem where the expected reward functions belong to a H\"older class with smoothness parameter $\beta$. We show how this interpolates between two extremes that were previously studied in isolation: non-differentiable bandits ($\beta\leq1$), where rate-optimal regret is achieved by running separate non-contextual bandits in different context regions, and parametric-response bandits ($\beta=\infty$), where rate-optimal regret can be achieved with minimal or no exploration due to infinite extrapolatability... (read more)

PDF Abstract