no code implementations • NeurIPS 2011 • Tim V. Erven, Wouter M. Koolen, Steven D. Rooij, Peter Grünwald
In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others.