no code implementations • 23 Dec 2023 • Su Jia, Andrew Li, R. Ravi, Nishant Oli, Paul Duff, Ian Anderson
We aim to minimize the loss due to not knowing the mean rewards, averaged over instances generated from a given prior distribution.