Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

NeurIPS 2011 · Elad Hazan, Satyen Kale ·

We present an efficient algorithm for the problem of online multiclass prediction with bandit feedback in the fully adversarial setting. We measure its regret with respect to the log-loss defined in \cite{AbernethyR09}, which is parameterized by a scalar \(\alpha\). We prove that the regret of \newtron is \(O(\log T)\) when \(\alpha\) is a constant that does not vary with horizon \(T\), and at most \(O(T^{2/3})\) if \(\alpha\) is allowed to increase to infinity with \(T\). For \(\alpha\) = \(O(\log T)\), the regret is bounded by \(O(\sqrt{T})\), thus solving the open problem of \cite{KST08, AbernethyR09}. Our algorithm is based on a novel application of the online Newton method \cite{HAK07}. We test our algorithm and show it to perform well in experiments, even when \(\alpha\) is a small constant.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove