Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

NeurIPS 2011  ·  Elad Hazan, Satyen Kale ·

We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in \cite{AbernethyR09}, which is parameterized by a scalar \(\alpha\). We prove that the regret of \newtron is \(O(\log T)\) when \(\alpha\) is a constant that does not vary with horizon \(T\), and at most \(O(T^{2/3})\) if \(\alpha\) is allowed to increase to infinity with \(T\). For \(\alpha\) = \(O(\log T)\), the regret is bounded by \(O(\sqrt{T})\), thus solving the open problem of \cite{KST08, AbernethyR09}. Our algorithm is based on a novel application of the online Newton method \cite{HAK07}. We test our algorithm and show it to perform well in experiments, even when \(\alpha\) is a small constant.

PDF Abstract
No code implementations yet. Submit your code now



  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here