We consider the adversarial convex bandit problem and we build the first $\mathrm{poly}(T)$-time algorithm with $\mathrm{poly}(n) \sqrt{T}$-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate)... (read more)

PDF
Submit
results from this paper
to get state-of-the-art GitHub badges and help the
community compare results to other papers.

METHOD | TYPE | |
---|---|---|

🤖 No Methods Found | Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet |