Bandit Online Optimization Over the Permutahedron

5 Dec 2013  ·  Nir Ailon, Kohei Hatano, Eiji Takimoto ·

The permutahedron is the convex polytope with vertex set consisting of the vectors $(\pi(1),\dots, \pi(n))$ for all permutations (bijections) $\pi$ over $\{1,\dots, n\}$. We study a bandit game in which, at each step $t$, an adversary chooses a hidden weight weight vector $s_t$, a player chooses a vertex $\pi_t$ of the permutahedron and suffers an observed loss of $\sum_{i=1}^n \pi(i) s_t(i)$. A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of $O(n\sqrt{T \log n})$ for a time horizon of $T$. Unfortunately, CombBand requires at each step an $n$-by-$n$ matrix permanent approximation to within improved accuracy as $T$ grows, resulting in a total running time that is super linear in $T$, making it impractical for large time horizons. We provide an algorithm of regret $O(n^{3/2}\sqrt{T})$ with total time complexity $O(n^3T)$. The ideas are a combination of CombBand and a recent algorithm by Ailon (2013) for online optimization over the permutahedron in the full information setting. The technical core is a bound on the variance of the Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices generated from rational functions of exponentials of 3 parameters.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here