no code implementations • 18 Mar 2024 • Nadav Merlis, Dorian Baudry, Vianney Perchet
In particular, we measure the ratio between the value of standard RL agents and that of agents with partial future-reward lookahead.
no code implementations • 20 Feb 2024 • Charles Arnal, Vivien Cabannes, Vianney Perchet
The combination of lightly supervised pre-training and online fine-tuning has played a key role in recent AI developments.
no code implementations • 1 Sep 2023 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko
We study how to learn $\epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback.
no code implementations • NeurIPS 2023 • Mathieu Molina, Nicolas Gast, Patrick Loiseau, Vianney Perchet
We consider the problem of online allocation subject to a long-term fairness penalty.
no code implementations • 3 Jun 2023 • Felipe Garrido-Lucero, Benjamin Heymann, Maxime Vono, Patrick Loiseau, Vianney Perchet
The Shapley value has recently been proposed as a principled tool to achieve this goal due to formal axiomatic justification.
no code implementations • 31 May 2023 • Hugo Richard, Etienne Boursier, Vianney Perchet
This motivates the harder, asynchronous multiplayer bandits problem, which was first tackled with an explore-then-commit (ETC) algorithm (see Dakdouk, 2022), with a regret upper-bound in $\mathcal{O}(T^{\frac{2}{3}})$.
1 code implementation • 23 Dec 2022 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko
Imperfect information games (IIG) are games in which each player only partially observes the current game state.
no code implementations • 29 Nov 2022 • Etienne Boursier, Vianney Perchet
Due mostly to its application to cognitive radio networks, multiplayer bandits gained a lot of interest in the last decade.
no code implementations • 23 Oct 2022 • Sasila Ilandarideva, Yannis Bekri, Anatoli Juditsky, Vianney Perchet
In this paper we discuss an application of Stochastic Approximation to statistical estimation of high-dimensional sparse parameters.
1 code implementation • 31 May 2022 • Nadav Merlis, Hugo Richard, Flore Sentenac, Corentin Odic, Mathieu Molina, Vianney Perchet
We study single-machine scheduling of jobs, each belonging to a job type that determines its duration distribution.
1 code implementation • 26 May 2022 • Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi
The workhorse of machine learning is stochastic gradient descent.
no code implementations • 15 Feb 2022 • Vianney Perchet, Philippe Rigollet, Thibaut Le Gouic
In the case of asymmetric values where optimal solutions need not exist but Nash equilibria do, our algorithm samples from an $\varepsilon$-Nash equilibrium with similar complexity but where implicit constants depend on various parameters of the game such as battlefield values.
no code implementations • 11 Dec 2021 • Evrard Garcelon, Kamalika Chaudhuri, Vianney Perchet, Matteo Pirotta
Contextual bandit algorithms are widely used in domains where it is desirable to provide a personalized service by leveraging contextual information, that may contain sensitive information that needs to be protected.
no code implementations • NeurIPS 2021 • Reda Ouhamma, Odalric Maillard, Vianney Perchet
We consider the problem of online linear regression in the stochastic setting.
no code implementations • NeurIPS 2021 • Reda Ouhamma, Rémy Degenne, Pierre Gaillard, Vianney Perchet
In the fixed budget thresholding bandit problem, an algorithm sequentially allocates a budgeted number of samples to different distributions.
no code implementations • 31 Jul 2021 • Flore Sentenac, Jialin Yi, Clément Calauzènes, Vianney Perchet, Milan Vojnovic
Finding an optimal matching in a weighted graph is a standard combinatorial problem.
no code implementations • NeurIPS 2021 • Nathan Noiry, Flore Sentenac, Vianney Perchet
Motivated by sequential budgeted allocation problems, we investigate online matching problems where connections between vertices are not i. i. d., but they have fixed degree distributions -- the so-called configuration model.
no code implementations • 10 Jun 2021 • Firas Jarboui, Vianney Perchet
We introduce a new procedure to neuralize unsupervised Hidden Markov Models in the continuous case.
no code implementations • 9 Jun 2021 • Firas Jarboui, Vianney Perchet
Current solutions either solve a behaviour cloning problem (which does not leverage the exploratory data) or a reinforced imitation learning problem (using a fixed cost function that discriminates available exploratory trajectories from expert ones).
no code implementations • NeurIPS 2021 • Flore Sentenac, Etienne Boursier, Vianney Perchet
In the centralized case, the number of accumulated packets remains bounded (i. e., the system is \textit{stable}) as long as the ratio between service rates and arrival rates is larger than $1$.
no code implementations • 25 May 2021 • Firas Jarboui, Vianney Perchet
The gloabal objective of inverse Reinforcement Learning (IRL) is to estimate the unknown cost function of some MDP base on observed trajectories generated by (approximate) optimal policies.
no code implementations • 17 Mar 2021 • Evrard Garcelon, Vianney Perchet, Matteo Pirotta
A critical aspect of bandit methods is that they require to observe the contexts --i. e., individual or group-level data-- and rewards in order to solve the sequential problem.
1 code implementation • NeurIPS 2021 • Etienne Boursier, Tristan Garrec, Vianney Perchet, Marco Scarsini
If she accepts the proposal, she is busy for the duration of the task and obtains a reward that depends on the task duration.
no code implementations • 4 Jan 2021 • Matthieu Jedor, Jonathan Louëdec, Vianney Perchet
On the other hand, this heuristic performs reasonably well in practice and it even has sublinear, and even near-optimal, regret bounds in some very specific linear contextual and Bayesian bandit models.
no code implementations • 1 Jan 2021 • Firas Jarboui, Vianney Perchet
We consider the quickest change detection problem where both the parameters of pre- and post- change distributions are unknown, which prevent the use of classical simple hypothesis testing.
no code implementations • 28 Dec 2020 • Matthieu Jedor, Jonathan Louëdec, Vianney Perchet
Continuously learning and leveraging the knowledge accumulated from prior tasks in order to improve future performance is a long standing machine learning problem.
no code implementations • NeurIPS 2020 • Sandrine Peche, Vianney Perchet
We consider the stochastic block model where connection between vertices is perturbed by some latent (and unobserved) random geometric graph.
no code implementations • NeurIPS 2021 • Evrard Garcelon, Vianney Perchet, Ciara Pike-Burke, Matteo Pirotta
Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user side.
no code implementations • 20 Jul 2020 • Etienne Boursier, Vianney Perchet, Marco Scarsini
In the simple uni-dimensional and static setting, beliefs about the quality are known to converge to its true value.
no code implementations • NeurIPS 2020 • Pierre Perrault, Etienne Boursier, Vianney Perchet, Michal Valko
In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family.
1 code implementation • NeurIPS 2019 • Matthieu Jedor, Jonathan Louedec, Vianney Perchet
We introduce a new stochastic multi-armed bandit setting where arms are grouped inside ``ordered'' categories.
no code implementations • 4 Feb 2020 • Etienne Boursier, Vianney Perchet
We provide the first algorithm robust to selfish players (a. k. a.
no code implementations • 25 Sep 2019 • Firas Jarboui, Vianney Perchet, Roman EGGER
Expanding Non Markovian Reward Decision Processes (NMRDP) into Markov Decision Processes (MDP) enables the use of state of the art Reinforcement Learning (RL) techniques to identify optimal policies.
no code implementations • 10 Jul 2019 • Firas Jarboui, Célya Gruson-daniel, Pierre Chanial, Alain Durmus, Vincent Rocchisani, Sophie-helene Goulet Ebongue, Anneliese Depoux, Wilfried Kirschenmann, Vianney Perchet
Studies on massive open online courses (MOOCs) users discuss the existence of typical profiles and their impact on the learning process of the students.
no code implementations • 20 Jun 2019 • Xavier Fontaine, Pierre Perrault, Michal Valko, Vianney Perchet
By trying to minimize the $\ell^2$-loss $\mathbb{E} [\lVert\hat{\beta}-\beta^{\star}\rVert^2]$ the decision maker is actually minimizing the trace of the covariance matrix of the problem, which corresponds then to online A-optimal design.
no code implementations • NeurIPS 2021 • Nicolò Cesa-Bianchi, Tommaso Cesari, Yishay Mansour, Vianney Perchet
We introduce a novel theoretical framework for Return On Investment (ROI) maximization in repeated decision-making.
1 code implementation • 27 May 2019 • Etienne Boursier, Vianney Perchet
Strategic information is valuable either by remaining private (for instance if it is sensitive) or, on the other hand, by being used publicly to increase some utility.
no code implementations • 12 Feb 2019 • Xavier Fontaine, Shie Mannor, Vianney Perchet
This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret.
no code implementations • 11 Feb 2019 • Pierre Perrault, Vianney Perchet, Michal Valko
We improve the efficiency of algorithms for stochastic \emph{combinatorial semi-bandits}.
no code implementations • 4 Feb 2019 • Etienne Boursier, Emilie Kaufmann, Abbas Mehrabian, Vianney Perchet
We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward.
no code implementations • 11 Oct 2018 • Xavier Fontaine, Quentin Berthet, Vianney Perchet
We consider the stochastic contextual bandit problem with additional regularization.
no code implementations • 9 Oct 2018 • Rémy Degenne, Thomas Nedelec, Clément Calauzènes, Vianney Perchet
State of the art online learning procedures focus either on selecting the best alternative ("best arm identification") or on minimizing the cost (the "regret").
1 code implementation • NeurIPS 2019 • Etienne Boursier, Vianney Perchet
Motivated by cognitive radio networks, we consider the stochastic multiplayer multi-armed bandit problem, where several players pull arms simultaneously and collisions occur if one of them is pulled by several players at the same stage.
no code implementations • 10 Jul 2018 • Rémy Degenne, Evrard Garcelon, Vianney Perchet
We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $\epsilon$, an extra observation is gathered by the agent for free.
no code implementations • 9 Jul 2018 • Nicolò Cesa-Bianchi, Tommaso Cesari, Vianney Perchet
When $K=2$ in the distribution-dependent case, the hardness of our setting reduces to that of a stochastic $2$-armed bandit: we prove that an upper bound of order $(\log T)/\Delta$ (up to $\log\log$ factors) on the regret can be achieved with no information on the demand curve.
no code implementations • 6 Jun 2018 • Pierre Perrault, Vianney Perchet, Michal Valko
We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution.
no code implementations • 28 Jun 2017 • Claire Vernade, Olivier Cappé, Vianney Perchet
We assume that the probability of conversion associated with each action is unknown while the distribution of the conversion delay is known, distinguishing between the (idealized) case where the conversion events may be observed whatever their delay and the more realistic setting in which late conversions are censored.
no code implementations • 5 Jun 2017 • Joon Kwon, Vianney Perchet, Claire Vernade
In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward.
no code implementations • 3 Apr 2017 • Thomas Nedelec, Nicolas Le Roux, Vianney Perchet
We provide a comparative study of several widely used off-policy estimators (Empirical Average, Basic Importance Sampling and Normalized Importance Sampling), detailing the different regimes where they are individually suboptimal.
no code implementations • NeurIPS 2017 • Quentin Berthet, Vianney Perchet
We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback.
no code implementations • NeurIPS 2016 • Rémy Degenne, Vianney Perchet
We introduce a way to quantify the dependency structure of the problem and design an algorithm that adapts to it.
no code implementations • 28 Sep 2016 • János Flesch, Rida Laraki, Vianney Perchet
The third is necessary: if it is not satisfied, the opponent can weakly exclude the target set.
no code implementations • 26 May 2016 • Francis Bach, Vianney Perchet
The minimization of convex functions which are only available through partial and noisy information is a key methodological problem in many disciplines.
no code implementations • 26 Nov 2015 • Joon Kwon, Vianney Perchet
We demonstrate that, in the classical non-stochastic regret minimization problem with $d$ decisions, gains and losses to be respectively maximized or minimized are fundamentally different.
no code implementations • 18 Nov 2015 • Jonathan Weed, Vianney Perchet, Philippe Rigollet
To our knowledge, this is the first complete set of strategies for bidders participating in auctions of this type.
no code implementations • 10 Feb 2014 • Shie Mannor, Vianney Perchet, Gilles Stoltz
We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals.
no code implementations • 19 Nov 2013 • Emile Contal, Vianney Perchet, Nicolas Vayatis
In this paper, we analyze a generic algorithm scheme for sequential global optimization using Gaussian processes.
no code implementations • 23 May 2013 • Shie Mannor, Vianney Perchet, Gilles Stoltz
In this paper we provide primal conditions on a convex set to be approachable with partial monitoring.
no code implementations • 27 Oct 2011 • Vianney Perchet, Philippe Rigollet
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate.