Search Results for author: Nathaniel Korda

Found 6 papers, 1 papers with code

On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

no code implementations12 Nov 2014 Nathaniel Korda, L. A. Prashanth

Furthermore, we propose a variant of TD(0) with linear approximators that incorporates a centering sequence, and establish that it exhibits an exponential rate of convergence in expectation.

Finite-Time Analysis of Kernelised Contextual Bandits

no code implementations26 Sep 2013 Michal Valko, Nathaniel Korda, Remi Munos, Ilias Flaounas, Nelo Cristianini

For contextual bandits, the related algorithm GP-UCB turns out to be a special case of our algorithm, and our finite-time analysis improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function.

Multi-Armed Bandits

Thompson Sampling for 1-Dimensional Exponential Family Bandits

no code implementations NeurIPS 2013 Nathaniel Korda, Emilie Kaufmann, Remi Munos

Thompson Sampling has been demonstrated in many complex bandit models, however the theoretical guarantees available for the parametric multi-armed bandit are still limited to the Bernoulli case.

Thompson Sampling

Fast gradient descent for drifting least squares regression, with application to bandits

no code implementations11 Jul 2013 Nathaniel Korda, Prashanth L. A., Rémi Munos

In the case when strong convexity in the regression problem is guaranteed, we provide bounds on the error both in expectation and high probability (the latter is often needed to provide theoretical guarantees for higher level algorithms), despite the drifting least squares solution.

News Recommendation regression

Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling

no code implementations11 Jun 2013 L. A. Prashanth, Nathaniel Korda, Rémi Munos

We propose a stochastic approximation (SA) based method with randomization of samples for policy evaluation using the least squares temporal difference (LSTD) algorithm.

Multi-Armed Bandits News Recommendation +1

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

1 code implementation18 May 2012 Emilie Kaufmann, Nathaniel Korda, Rémi Munos

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933.

3D Reconstruction Thompson Sampling

Cannot find the paper you are looking for? You can Submit a new open access paper.