# Thompson Sampling for Combinatorial Semi-Bandits

We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of $O(mK_{\max}\log T / \Delta_{\min})$, where $m$ is the number of arms, $K_{\max}$ is the size of the largest super arm, $T$ is the time horizon, and $\Delta_{\min}$ is the minimum gap between the expected reward of the optimal solution and any non-optimal solution... (read more)

