no code implementations • 12 Nov 2014 • Nathaniel Korda, L. A. Prashanth
Furthermore, we propose a variant of TD(0) with linear approximators that incorporates a centering sequence, and establish that it exhibits an exponential rate of convergence in expectation.
no code implementations • 26 Sep 2013 • Michal Valko, Nathaniel Korda, Remi Munos, Ilias Flaounas, Nelo Cristianini
For contextual bandits, the related algorithm GP-UCB turns out to be a special case of our algorithm, and our finite-time analysis improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function.
no code implementations • NeurIPS 2013 • Nathaniel Korda, Emilie Kaufmann, Remi Munos
Thompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case.
no code implementations • 11 Jul 2013 • Nathaniel Korda, Prashanth L. A., Rémi Munos
In the case when strong convexity in the regression problem is guaranteed, we provide bounds on the error both in expectation and high probability (the latter is often needed to provide theoretical guarantees for higher level algorithms), despite the drifting least squares solution.
no code implementations • 11 Jun 2013 • L. A. Prashanth, Nathaniel Korda, Rémi Munos
We propose a stochastic approximation (SA) based method with randomization of samples for policy evaluation using the least squares temporal difference (LSTD) algorithm.
1 code implementation • 18 May 2012 • Emilie Kaufmann, Nathaniel Korda, Rémi Munos
The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933.