# Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits

30 Mar 2019Yingkai LiYining WangYuan Zhou

We study the linear contextual bandit problem with finite action sets. When the problem dimension is $d$, the time horizon is $T$, and there are $n \leq 2^{d/2}$ candidate actions per time period, we (1) show that the minimax expected regret is $\Omega(\sqrt{dT \log T \log n})$ for every algorithm, and (2) introduce a Variable-Confidence-Level (VCL) SupLinUCB algorithm whose regret matches the lower bound up to iterated logarithmic factors... (read more)

PDF Abstract

No code implementations yet. Submit your code now