no code implementations • 5 Nov 2021 • Jie Bian, Kwang-Sung Jun
This less-known algorithm, which we call Maillard sampling (MS), computes the probability of choosing each arm in a \textit{closed form}, which is not true for Thompson sampling, a widely-adopted bandit algorithm in the industry.