Search Results for author: Vianney Perchet

Found 59 papers, 7 papers with code

The Value of Reward Lookahead in Reinforcement Learning

no code implementations • 18 Mar 2024 • Nadav Merlis, Dorian Baudry, Vianney Perchet

In particular, we measure the ratio between the value of standard RL agents and that of agents with partial future-reward lookahead.

Offline RL reinforcement-learning +1

Paper
Add Code

Mode Estimation with Partial Feedback

no code implementations • 20 Feb 2024 • Charles Arnal, Vivien Cabannes, Vianney Perchet

The combination of lightly supervised pre-training and online fine-tuning has played a key role in recent AI developments.

Active Learning

Paper
Add Code

Local and adaptive mirror descents in extensive-form games

no code implementations • 1 Sep 2023 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

We study how to learn $\epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback.

Paper
Add Code

Trading-off price for data quality to achieve fair online allocation

no code implementations • NeurIPS 2023 • Mathieu Molina, Nicolas Gast, Patrick Loiseau, Vianney Perchet

We consider the problem of online allocation subject to a long-term fairness penalty.

Fairness

Paper
Add Code

DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation

no code implementations • 3 Jun 2023 • Felipe Garrido-Lucero, Benjamin Heymann, Maxime Vono, Patrick Loiseau, Vianney Perchet

The Shapley value has recently been proposed as a principled tool to achieve this goal due to formal axiomatic justification.

Federated Learning

Paper
Add Code

Constant or logarithmic regret in asynchronous multiplayer bandits

no code implementations • 31 May 2023 • Hugo Richard, Etienne Boursier, Vianney Perchet

This motivates the harder, asynchronous multiplayer bandits problem, which was first tackled with an explore-then-commit (ETC) algorithm (see Dakdouk, 2022), with a regret upper-bound in $\mathcal{O}(T^{\frac{2}{3}})$.

Paper
Add Code

Adapting to game trees in zero-sum imperfect information games

1 code implementation • 23 Dec 2022 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

Imperfect information games (IIG) are games in which each player only partially observes the current game state.

Paper
Code

A survey on multi-player bandits

no code implementations • 29 Nov 2022 • Etienne Boursier, Vianney Perchet

Due mostly to its application to cognitive radio networks, multiplayer bandits gained a lot of interest in the last decade.

Paper
Add Code

Stochastic Mirror Descent for Large-Scale Sparse Recovery

no code implementations • 23 Oct 2022 • Sasila Ilandarideva, Yannis Bekri, Anatoli Juditsky, Vianney Perchet

In this paper we discuss an application of Stochastic Approximation to statistical estimation of high-dimensional sparse parameters.

Stochastic Optimization

Paper
Add Code

On Preemption and Learning in Stochastic Scheduling

1 code implementation • 31 May 2022 • Nadav Merlis, Hugo Richard, Flore Sentenac, Corentin Odic, Mathieu Molina, Vianney Perchet

We study single-machine scheduling of jobs, each belonging to a job type that determines its duration distribution.

Efficient Exploration Scheduling

Paper
Code

Active Labeling: Streaming Stochastic Gradients

1 code implementation • 26 May 2022 • Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi

The workhorse of machine learning is stochastic gradient descent.

Active Learning regression

Paper
Code

An algorithmic solution to the Blotto game using multi-marginal couplings

no code implementations • 15 Feb 2022 • Vianney Perchet, Philippe Rigollet, Thibaut Le Gouic

In the case of asymmetric values where optimal solutions need not exist but Nash equilibria do, our algorithm samples from an $\varepsilon$-Nash equilibrium with similar complexity but where implicit constants depend on various parameters of the game such as battlefield values.

Paper
Add Code

Privacy Amplification via Shuffling for Linear Contextual Bandits

no code implementations • 11 Dec 2021 • Evrard Garcelon, Kamalika Chaudhuri, Vianney Perchet, Matteo Pirotta

Contextual bandit algorithms are widely used in domains where it is desirable to provide a personalized service by leveraging contextual information, that may contain sensitive information that needs to be protected.

Multi-Armed Bandits

Paper
Add Code

Stochastic Online Linear Regression: the Forward Algorithm to Replace Ridge

no code implementations • NeurIPS 2021 • Reda Ouhamma, Odalric Maillard, Vianney Perchet

We consider the problem of online linear regression in the stochastic setting.

regression

Paper
Add Code

Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits

no code implementations • NeurIPS 2021 • Reda Ouhamma, Rémy Degenne, Pierre Gaillard, Vianney Perchet

In the fixed budget thresholding bandit problem, an algorithm sequentially allocates a budgeted number of samples to different distributions.

Paper
Add Code

Pure Exploration and Regret Minimization in Matching Bandits

no code implementations • 31 Jul 2021 • Flore Sentenac, Jialin Yi, Clément Calauzènes, Vianney Perchet, Milan Vojnovic

Finding an optimal matching in a weighted graph is a standard combinatorial problem.

Paper
Add Code

Online Matching in Sparse Random Graphs: Non-Asymptotic Performances of Greedy Algorithm

no code implementations • NeurIPS 2021 • Nathan Noiry, Flore Sentenac, Vianney Perchet

Motivated by sequential budgeted allocation problems, we investigate online matching problems where connections between vertices are not i. i. d., but they have fixed degree distributions -- the so-called configuration model.

Paper
Add Code

Unsupervised Neural Hidden Markov Models with a Continuous latent state space

no code implementations • 10 Jun 2021 • Firas Jarboui, Vianney Perchet

We introduce a new procedure to neuralize unsupervised Hidden Markov Models in the continuous case.

Paper
Add Code

Offline Inverse Reinforcement Learning

no code implementations • 9 Jun 2021 • Firas Jarboui, Vianney Perchet

Current solutions either solve a behaviour cloning problem (which does not leverage the exploratory data) or a reinforced imitation learning problem (using a fixed cost function that discriminates available exploratory trajectories from expert ones).

Data Augmentation Imitation Learning +4

Paper
Add Code

Decentralized Learning in Online Queuing Systems

no code implementations • NeurIPS 2021 • Flore Sentenac, Etienne Boursier, Vianney Perchet

In the centralized case, the number of accumulated packets remains bounded (i. e., the system is \textit{stable}) as long as the ratio between service rates and arrival rates is larger than $1$.

Paper
Add Code

A Generalised Inverse Reinforcement Learning Framework

no code implementations • 25 May 2021 • Firas Jarboui, Vianney Perchet

The gloabal objective of inverse Reinforcement Learning (IRL) is to estimate the unknown cost function of some MDP base on observed trajectories generated by (approximate) optimal policies.

OpenAI Gym reinforcement-learning +1

Paper
Add Code

Encrypted Linear Contextual Bandit

no code implementations • 17 Mar 2021 • Evrard Garcelon, Vianney Perchet, Matteo Pirotta

A critical aspect of bandit methods is that they require to observe the contexts --i. e., individual or group-level data-- and rewards in order to solve the sequential problem.

Decision Making Multi-Armed Bandits +2

Paper
Add Code

Making the most of your day: online learning for optimal allocation of time

1 code implementation • NeurIPS 2021 • Etienne Boursier, Tristan Garrec, Vianney Perchet, Marco Scarsini

If she accepts the proposal, she is busy for the duration of the task and obtains a reward that depends on the task duration.

Scheduling

Paper
Code

Be Greedy in Multi-Armed Bandits

no code implementations • 4 Jan 2021 • Matthieu Jedor, Jonathan Louëdec, Vianney Perchet

On the other hand, this heuristic performs reasonably well in practice and it even has sublinear, and even near-optimal, regret bounds in some very specific linear contextual and Bayesian bandit models.

Multi-Armed Bandits

Paper
Add Code

Quickest change detection for multi-task problems under unknown parameters

no code implementations • 1 Jan 2021 • Firas Jarboui, Vianney Perchet

We consider the quickest change detection problem where both the parameters of pre- and post- change distributions are unknown, which prevent the use of classical simple hypothesis testing.

Change Detection Two-sample testing

Paper
Add Code

Lifelong Learning in Multi-Armed Bandits

no code implementations • 28 Dec 2020 • Matthieu Jedor, Jonathan Louëdec, Vianney Perchet

Continuously learning and leveraging the knowledge accumulated from prior tasks in order to improve future performance is a long standing machine learning problem.

Multi-Armed Bandits

Paper
Add Code

Robustness of Community Detection to Random Geometric Perturbations

no code implementations • NeurIPS 2020 • Sandrine Peche, Vianney Perchet

We consider the stochastic block model where connection between vertices is perturbed by some latent (and unobserved) random geometric graph.

Community Detection Stochastic Block Model

Paper
Add Code

Local Differential Privacy for Regret Minimization in Reinforcement Learning

no code implementations • NeurIPS 2021 • Evrard Garcelon, Vianney Perchet, Ciara Pike-Burke, Matteo Pirotta

Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user side.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Social Learning in Non-Stationary Environments

no code implementations • 20 Jul 2020 • Etienne Boursier, Vianney Perchet, Marco Scarsini

In the simple uni-dimensional and static setting, beliefs about the quality are known to converge to its true value.

Paper
Add Code

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

no code implementations • NeurIPS 2020 • Pierre Perrault, Etienne Boursier, Vianney Perchet, Michal Valko

In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family.

Thompson Sampling

Paper
Add Code

Categorized Bandits

1 code implementation • NeurIPS 2019 • Matthieu Jedor, Jonathan Louedec, Vianney Perchet

We introduce a new stochastic multi-armed bandit setting where arms are grouped inside ``ordered'' categories.

Paper
Code

Selfish Robustness and Equilibria in Multi-Player Bandits

no code implementations • 4 Feb 2020 • Etienne Boursier, Vianney Perchet

We provide the first algorithm robust to selfish players (a. k. a.

Multi-Armed Bandits

Paper
Add Code

Trajectory representation learning for Multi-Task NMRDPs planning

no code implementations • 25 Sep 2019 • Firas Jarboui, Vianney Perchet, Roman EGGER

Expanding Non Markovian Reward Decision Processes (NMRDP) into Markov Decision Processes (MDP) enables the use of state of the art Reinforcement Learning (RL) techniques to identify optimal policies.

Reinforcement Learning (RL) Representation Learning

Paper
Add Code

Markov Decision Process for MOOC users behavioral inference

no code implementations • 10 Jul 2019 • Firas Jarboui, Célya Gruson-daniel, Pierre Chanial, Alain Durmus, Vincent Rocchisani, Sophie-helene Goulet Ebongue, Anneliese Depoux, Wilfried Kirschenmann, Vianney Perchet

Studies on massive open online courses (MOOCs) users discuss the existence of typical profiles and their impact on the learning process of the students.

Paper
Add Code

Online A-Optimal Design and Active Linear Regression

no code implementations • 20 Jun 2019 • Xavier Fontaine, Pierre Perrault, Michal Valko, Vianney Perchet

By trying to minimize the $\ell^2$-loss $\mathbb{E} [\lVert\hat{\beta}-\beta^{\star}\rVert^2]$ the decision maker is actually minimizing the trace of the covariance matrix of the problem, which corresponds then to online A-optimal design.

regression

Paper
Add Code

ROI Maximization in Stochastic Online Decision-Making

no code implementations • NeurIPS 2021 • Nicolò Cesa-Bianchi, Tommaso Cesari, Yishay Mansour, Vianney Perchet

We introduce a novel theoretical framework for Return On Investment (ROI) maximization in repeated decision-making.

Decision Making

Paper
Add Code

Utility/Privacy Trade-off through the lens of Optimal Transport

1 code implementation • 27 May 2019 • Etienne Boursier, Vianney Perchet

Strategic information is valuable either by remaining private (for instance if it is sensitive) or, on the other hand, by being used publicly to increase some utility.

Paper
Code

An adaptive stochastic optimization algorithm for resource allocation

no code implementations • 12 Feb 2019 • Xavier Fontaine, Shie Mannor, Vianney Perchet

This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret.

Stochastic Optimization

Paper
Add Code

Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits

no code implementations • 11 Feb 2019 • Pierre Perrault, Vianney Perchet, Michal Valko

We improve the efficiency of algorithms for stochastic \emph{combinatorial semi-bandits}.

Paper
Add Code

A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players

no code implementations • 4 Feb 2019 • Etienne Boursier, Emilie Kaufmann, Abbas Mehrabian, Vianney Perchet

We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward.

Open-Ended Question Answering

Paper
Add Code

Regularized Contextual Bandits

no code implementations • 11 Oct 2018 • Xavier Fontaine, Quentin Berthet, Vianney Perchet

We consider the stochastic contextual bandit problem with additional regularization.

Multi-Armed Bandits

Paper
Add Code

Bridging the gap between regret minimization and best arm identification, with application to A/B tests

no code implementations • 9 Oct 2018 • Rémy Degenne, Thomas Nedelec, Clément Calauzènes, Vianney Perchet

State of the art online learning procedures focus either on selecting the best alternative ("best arm identification") or on minimizing the cost (the "regret").

Paper
Add Code

SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits

1 code implementation • NeurIPS 2019 • Etienne Boursier, Vianney Perchet

Motivated by cognitive radio networks, we consider the stochastic multiplayer multi-armed bandit problem, where several players pull arms simultaneously and collisions occur if one of them is pulled by several players at the same stage.

Multi-Armed Bandits

Paper
Code

Bandits with Side Observations: Bounded vs. Logarithmic Regret

no code implementations • 10 Jul 2018 • Rémy Degenne, Evrard Garcelon, Vianney Perchet

We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $\epsilon$, an extra observation is gathered by the agent for free.

Paper
Add Code

Dynamic Pricing with Finitely Many Unknown Valuations

no code implementations • 9 Jul 2018 • Nicolò Cesa-Bianchi, Tommaso Cesari, Vianney Perchet

When $K=2$ in the distribution-dependent case, the hardness of our setting reduces to that of a stochastic $2$-armed bandit: we prove that an upper bound of order $(\log T)/\Delta$ (up to $\log\log$ factors) on the regret can be achieved with no information on the demand curve.

Paper
Add Code

Finding the bandit in a graph: Sequential search-and-stop

no code implementations • 6 Jun 2018 • Pierre Perrault, Vianney Perchet, Michal Valko

We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution.

Multi-Armed Bandits

Paper
Add Code

Stochastic Bandit Models for Delayed Conversions

no code implementations • 28 Jun 2017 • Claire Vernade, Olivier Cappé, Vianney Perchet

We assume that the probability of conversion associated with each action is unknown while the distribution of the conversion delay is known, distinguishing between the (idealized) case where the conversion events may be observed whatever their delay and the more realistic setting in which late conversions are censored.

Product Recommendation

Paper
Add Code

Sparse Stochastic Bandits

no code implementations • 5 Jun 2017 • Joon Kwon, Vianney Perchet, Claire Vernade

In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward.

Paper
Add Code

A comparative study of counterfactual estimators

no code implementations • 3 Apr 2017 • Thomas Nedelec, Nicolas Le Roux, Vianney Perchet

We provide a comparative study of several widely used off-policy estimators (Empirical Average, Basic Importance Sampling and Normalized Importance Sampling), detailing the different regimes where they are individually suboptimal.

counterfactual

Paper
Add Code

Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe

no code implementations • NeurIPS 2017 • Quentin Berthet, Vianney Perchet

We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback.

BIG-bench Machine Learning Stochastic Optimization

Paper
Add Code

Combinatorial semi-bandit with known covariance

no code implementations • NeurIPS 2016 • Rémy Degenne, Vianney Perchet

We introduce a way to quantify the dependency structure of the problem and design an algorithm that adapts to it.

Paper
Add Code

Approachability of convex sets in generalized quitting games

no code implementations • 28 Sep 2016 • János Flesch, Rida Laraki, Vianney Perchet

The third is necessary: if it is not satisfied, the opponent can weakly exclude the target set.

Paper
Add Code

Highly-Smooth Zero-th Order Online Optimization Vianney Perchet

no code implementations • 26 May 2016 • Francis Bach, Vianney Perchet

The minimization of convex functions which are only available through partial and noisy information is a key methodological problem in many disciplines.

Paper
Add Code

Gains and Losses are Fundamentally Different in Regret Minimization: The Sparse Case

no code implementations • 26 Nov 2015 • Joon Kwon, Vianney Perchet

We demonstrate that, in the classical non-stochastic regret minimization problem with $d$ decisions, gains and losses to be respectively maximized or minimized are fundamentally different.

Paper
Add Code

Online learning in repeated auctions

no code implementations • 18 Nov 2015 • Jonathan Weed, Vianney Perchet, Philippe Rigollet

To our knowledge, this is the first complete set of strategies for bidders participating in auctions of this type.

Paper
Add Code

Approachability in unknown games: Online learning meets multi-objective optimization

no code implementations • 10 Feb 2014 • Shie Mannor, Vianney Perchet, Gilles Stoltz

We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals.

Paper
Add Code

Gaussian Process Optimization with Mutual Information

no code implementations • 19 Nov 2013 • Emile Contal, Vianney Perchet, Nicolas Vayatis

In this paper, we analyze a generic algorithm scheme for sequential global optimization using Gaussian processes.

Gaussian Processes

Paper
Add Code

A Primal Condition for Approachability with Partial Monitoring

no code implementations • 23 May 2013 • Shie Mannor, Vianney Perchet, Gilles Stoltz

In this paper we provide primal conditions on a convex set to be approachable with partial monitoring.

Paper
Add Code

The multi-armed bandit problem with covariates

no code implementations • 27 Oct 2011 • Vianney Perchet, Philippe Rigollet

We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.