no code implementations • 21 Aug 2023 • Yichuan Deng, Michalis Mamakos, Zhao Song
Thus, maximizing the total reward requires learning not only models about the reward and the resource consumption, but also cluster memberships.
Econometrics Multi-Armed Bandits