Search Results for author: Gilles Stoltz

Found 18 papers, 1 papers with code

Symphony of experts: orchestration with adversarial insights in reinforcement learning

no code implementations25 Oct 2023 Matthieu Jonckheere, Chiara Mignacco, Gilles Stoltz

Structured reinforcement learning leverages policies with advantageous properties to reach better performance, particularly in scenarios where exploration poses challenges.

Decision Making reinforcement-learning

Parameter-free projected gradient descent

no code implementations31 May 2023 Evgenii Chzhen, Christophe Giraud, Gilles Stoltz

We consider the problem of minimizing a convex function over a closed convex set, with Projected Gradient Descent (PGD).

Stochastic Optimization

On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits

no code implementations30 Sep 2022 Antoine Barrier, Aurélien Garivier, Gilles Stoltz

All these new upper and lower bounds generalize existing bounds based, e. g., on gaps between distributions.

Multi-Armed Bandits

Contextual Bandits with Knapsacks for a Conversion Model

no code implementations1 Jun 2022 Zhen Li, Gilles Stoltz

At each round, given the stochastic i. i. d.\ context $\mathbf{x}_t$ and the arm picked $a_t$ (corresponding, e. g., to a discount level), a customer conversion may be obtained, in which case a reward $r(a,\mathbf{x}_t)$ is gained and vector costs $c(a_t,\mathbf{x}_t)$ are suffered (corresponding, e. g., to losses of earnings).

Multi-Armed Bandits

A Unified Approach to Fair Online Learning via Blackwell Approachability

no code implementations NeurIPS 2021 Evgenii Chzhen, Christophe Giraud, Gilles Stoltz

We provide a setting and a general approach to fair online learning with stochastic sensitive and non-sensitive contexts.

Fairness

Diversity-Preserving K-Armed Bandits, Revisited

no code implementations5 Oct 2020 Hédi Hadiji, Sébastien Gerchinovitz, Jean-Michel Loubes, Gilles Stoltz

We consider the bandit-based framework for diversity-preserving recommendations introduced by Celis et al. (2019), who approached it in the case of a polytope mainly by a reduction to the setting of linear bandits.

Adaptation to the Range in $K$-Armed Bandits

no code implementations5 Jun 2020 Hédi Hadiji, Gilles Stoltz

We consider stochastic bandit problems with $K$ arms, each associated with a bounded distribution supported on the range $[m, M]$.

Hierarchical robust aggregation of sales forecasts at aggregated levels in e-commerce, based on exponential smoothing and Holt's linear trend method

no code implementations5 Jun 2020 Malo Huard, Rémy Garnier, Gilles Stoltz

We revisit the interest of classical statistical techniques for sales forecasting like exponential smoothing and extensions thereof (as Holt's linear trend method).

Learning Theory

Sequential model aggregation for production forecasting

no code implementations30 Nov 2018 Raphaël Deswarte, Véronique Gervais, Gilles Stoltz, Sébastien da Veiga

An extension of the deterministic aggregation approach is thus proposed in this paper to provide such multi-step-ahead forecasts.

regression

Uniform regret bounds over $R^d$ for the sequential linear regression problem with the square loss

no code implementations29 May 2018 Pierre Gaillard, Sébastien Gerchinovitz, Malo Huard, Gilles Stoltz

In the case of sequentially revealed features, we also derive an asymptotic regret bound of $d B^2 \ln T$ for any individual sequence of features and bounded observations.

regression

KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

1 code implementation14 May 2018 Aurélien Garivier, Hédi Hadiji, Pierre Menard, Gilles Stoltz

We were able to obtain this non-parametric bi-optimality result while working hard to streamline the proofs (of previously known regret bounds and thus of the new analyses carried out); a second merit of the present contribution is therefore to provide a review of proofs of classical regret bounds for index-based strategies for $K$-armed stochastic bandits.

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

no code implementations23 Feb 2016 Aurélien Garivier, Pierre Ménard, Gilles Stoltz

We revisit lower bounds on the regret in the case of multi-armed bandit problems.

Approachability in unknown games: Online learning meets multi-objective optimization

no code implementations10 Feb 2014 Shie Mannor, Vianney Perchet, Gilles Stoltz

We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals.

A Second-order Bound with Excess Losses

no code implementations10 Feb 2014 Pierre Gaillard, Gilles Stoltz, Tim van Erven

We study online aggregation of the predictions of experts, and first show new second-order regret bounds in the standard setting, which are obtained via a version of the Prod algorithm (and also a version of the polynomially weighted average algorithm) with multiple learning rates.

A Primal Condition for Approachability with Partial Monitoring

no code implementations23 May 2013 Shie Mannor, Vianney Perchet, Gilles Stoltz

In this paper we provide primal conditions on a convex set to be approachable with partial monitoring.

Mirror Descent Meets Fixed Share (and feels no regret)

no code implementations NeurIPS 2012 Nicolò Cesa-Bianchi, Pierre Gaillard, Gabor Lugosi, Gilles Stoltz

Mirror descent with an entropic regularizer is known to achieve shifting regret bounds that are logarithmic in the dimension.

Online Optimization in X-Armed Bandits

no code implementations NeurIPS 2008 Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, Rémi Munos

We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space.

Cannot find the paper you are looking for? You can Submit a new open access paper.