no code implementations • 23 Sep 2020 • Farzana Beente Yusuf, Vitalii Stebliankin, Giuseppe Vietri, Giri Narasimhan
We derive an optimal learning rate for EXP4-DFDC that defines the balance between exploration and exploitation and proves theoretically that the expected regret of our algorithm is a vanishing quantity as a function of time.