# Low-Rank Generalized Linear Bandit Problems

4 Jun 2020Yangyi LuAmirhossein MeisamiAmbuj Tewari

In a low-rank linear bandit problem, the reward of an action (represented by a matrix of size $d_1 \times d_2$) is the inner product between the action and an unknown low-rank matrix $\Theta^*$. We propose an algorithm based on a novel combination of online-to-confidence-set conversion~\citep{abbasi2012online} and the exponentially weighted average forecaster constructed by a covering of low-rank matrices... (read more)

