Search Results for author: Vianney Perchet

Found 59 papers, 7 papers with code

The Value of Reward Lookahead in Reinforcement Learning

no code implementations18 Mar 2024 Nadav Merlis, Dorian Baudry, Vianney Perchet

In particular, we measure the ratio between the value of standard RL agents and that of agents with partial future-reward lookahead.

Offline RL reinforcement-learning +1

Mode Estimation with Partial Feedback

no code implementations20 Feb 2024 Charles Arnal, Vivien Cabannes, Vianney Perchet

The combination of lightly supervised pre-training and online fine-tuning has played a key role in recent AI developments.

Active Learning

Local and adaptive mirror descents in extensive-form games

no code implementations1 Sep 2023 Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

We study how to learn $\epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback.

DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation

no code implementations3 Jun 2023 Felipe Garrido-Lucero, Benjamin Heymann, Maxime Vono, Patrick Loiseau, Vianney Perchet

The Shapley value has recently been proposed as a principled tool to achieve this goal due to formal axiomatic justification.

Federated Learning

Constant or logarithmic regret in asynchronous multiplayer bandits

no code implementations31 May 2023 Hugo Richard, Etienne Boursier, Vianney Perchet

This motivates the harder, asynchronous multiplayer bandits problem, which was first tackled with an explore-then-commit (ETC) algorithm (see Dakdouk, 2022), with a regret upper-bound in $\mathcal{O}(T^{\frac{2}{3}})$.

Adapting to game trees in zero-sum imperfect information games

1 code implementation23 Dec 2022 Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

Imperfect information games (IIG) are games in which each player only partially observes the current game state.

A survey on multi-player bandits

no code implementations29 Nov 2022 Etienne Boursier, Vianney Perchet

Due mostly to its application to cognitive radio networks, multiplayer bandits gained a lot of interest in the last decade.

Stochastic Mirror Descent for Large-Scale Sparse Recovery

no code implementations23 Oct 2022 Sasila Ilandarideva, Yannis Bekri, Anatoli Juditsky, Vianney Perchet

In this paper we discuss an application of Stochastic Approximation to statistical estimation of high-dimensional sparse parameters.

Stochastic Optimization

On Preemption and Learning in Stochastic Scheduling

1 code implementation31 May 2022 Nadav Merlis, Hugo Richard, Flore Sentenac, Corentin Odic, Mathieu Molina, Vianney Perchet

We study single-machine scheduling of jobs, each belonging to a job type that determines its duration distribution.

Efficient Exploration Scheduling

An algorithmic solution to the Blotto game using multi-marginal couplings

no code implementations15 Feb 2022 Vianney Perchet, Philippe Rigollet, Thibaut Le Gouic

In the case of asymmetric values where optimal solutions need not exist but Nash equilibria do, our algorithm samples from an $\varepsilon$-Nash equilibrium with similar complexity but where implicit constants depend on various parameters of the game such as battlefield values.

Privacy Amplification via Shuffling for Linear Contextual Bandits

no code implementations11 Dec 2021 Evrard Garcelon, Kamalika Chaudhuri, Vianney Perchet, Matteo Pirotta

Contextual bandit algorithms are widely used in domains where it is desirable to provide a personalized service by leveraging contextual information, that may contain sensitive information that needs to be protected.

Multi-Armed Bandits

Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits

no code implementations NeurIPS 2021 Reda Ouhamma, Rémy Degenne, Pierre Gaillard, Vianney Perchet

In the fixed budget thresholding bandit problem, an algorithm sequentially allocates a budgeted number of samples to different distributions.

Pure Exploration and Regret Minimization in Matching Bandits

no code implementations31 Jul 2021 Flore Sentenac, Jialin Yi, Clément Calauzènes, Vianney Perchet, Milan Vojnovic

Finding an optimal matching in a weighted graph is a standard combinatorial problem.

Online Matching in Sparse Random Graphs: Non-Asymptotic Performances of Greedy Algorithm

no code implementations NeurIPS 2021 Nathan Noiry, Flore Sentenac, Vianney Perchet

Motivated by sequential budgeted allocation problems, we investigate online matching problems where connections between vertices are not i. i. d., but they have fixed degree distributions -- the so-called configuration model.

Unsupervised Neural Hidden Markov Models with a Continuous latent state space

no code implementations10 Jun 2021 Firas Jarboui, Vianney Perchet

We introduce a new procedure to neuralize unsupervised Hidden Markov Models in the continuous case.

Offline Inverse Reinforcement Learning

no code implementations9 Jun 2021 Firas Jarboui, Vianney Perchet

Current solutions either solve a behaviour cloning problem (which does not leverage the exploratory data) or a reinforced imitation learning problem (using a fixed cost function that discriminates available exploratory trajectories from expert ones).

Data Augmentation Imitation Learning +4

Decentralized Learning in Online Queuing Systems

no code implementations NeurIPS 2021 Flore Sentenac, Etienne Boursier, Vianney Perchet

In the centralized case, the number of accumulated packets remains bounded (i. e., the system is \textit{stable}) as long as the ratio between service rates and arrival rates is larger than $1$.

A Generalised Inverse Reinforcement Learning Framework

no code implementations25 May 2021 Firas Jarboui, Vianney Perchet

The gloabal objective of inverse Reinforcement Learning (IRL) is to estimate the unknown cost function of some MDP base on observed trajectories generated by (approximate) optimal policies.

OpenAI Gym reinforcement-learning +1

Encrypted Linear Contextual Bandit

no code implementations17 Mar 2021 Evrard Garcelon, Vianney Perchet, Matteo Pirotta

A critical aspect of bandit methods is that they require to observe the contexts --i. e., individual or group-level data-- and rewards in order to solve the sequential problem.

Decision Making Multi-Armed Bandits +2

Making the most of your day: online learning for optimal allocation of time

1 code implementation NeurIPS 2021 Etienne Boursier, Tristan Garrec, Vianney Perchet, Marco Scarsini

If she accepts the proposal, she is busy for the duration of the task and obtains a reward that depends on the task duration.

Scheduling

Be Greedy in Multi-Armed Bandits

no code implementations4 Jan 2021 Matthieu Jedor, Jonathan Louëdec, Vianney Perchet

On the other hand, this heuristic performs reasonably well in practice and it even has sublinear, and even near-optimal, regret bounds in some very specific linear contextual and Bayesian bandit models.

Multi-Armed Bandits

Quickest change detection for multi-task problems under unknown parameters

no code implementations1 Jan 2021 Firas Jarboui, Vianney Perchet

We consider the quickest change detection problem where both the parameters of pre- and post- change distributions are unknown, which prevent the use of classical simple hypothesis testing.

Change Detection Two-sample testing

Lifelong Learning in Multi-Armed Bandits

no code implementations28 Dec 2020 Matthieu Jedor, Jonathan Louëdec, Vianney Perchet

Continuously learning and leveraging the knowledge accumulated from prior tasks in order to improve future performance is a long standing machine learning problem.

Multi-Armed Bandits

Robustness of Community Detection to Random Geometric Perturbations

no code implementations NeurIPS 2020 Sandrine Peche, Vianney Perchet

We consider the stochastic block model where connection between vertices is perturbed by some latent (and unobserved) random geometric graph.

Community Detection Stochastic Block Model

Local Differential Privacy for Regret Minimization in Reinforcement Learning

no code implementations NeurIPS 2021 Evrard Garcelon, Vianney Perchet, Ciara Pike-Burke, Matteo Pirotta

Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user side.

reinforcement-learning Reinforcement Learning (RL)

Social Learning in Non-Stationary Environments

no code implementations20 Jul 2020 Etienne Boursier, Vianney Perchet, Marco Scarsini

In the simple uni-dimensional and static setting, beliefs about the quality are known to converge to its true value.

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

no code implementations NeurIPS 2020 Pierre Perrault, Etienne Boursier, Vianney Perchet, Michal Valko

In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family.

Thompson Sampling

Categorized Bandits

1 code implementation NeurIPS 2019 Matthieu Jedor, Jonathan Louedec, Vianney Perchet

We introduce a new stochastic multi-armed bandit setting where arms are grouped inside ``ordered'' categories.

Trajectory representation learning for Multi-Task NMRDPs planning

no code implementations25 Sep 2019 Firas Jarboui, Vianney Perchet, Roman EGGER

Expanding Non Markovian Reward Decision Processes (NMRDP) into Markov Decision Processes (MDP) enables the use of state of the art Reinforcement Learning (RL) techniques to identify optimal policies.

Reinforcement Learning (RL) Representation Learning

Markov Decision Process for MOOC users behavioral inference

no code implementations10 Jul 2019 Firas Jarboui, Célya Gruson-daniel, Pierre Chanial, Alain Durmus, Vincent Rocchisani, Sophie-helene Goulet Ebongue, Anneliese Depoux, Wilfried Kirschenmann, Vianney Perchet

Studies on massive open online courses (MOOCs) users discuss the existence of typical profiles and their impact on the learning process of the students.

Online A-Optimal Design and Active Linear Regression

no code implementations20 Jun 2019 Xavier Fontaine, Pierre Perrault, Michal Valko, Vianney Perchet

By trying to minimize the $\ell^2$-loss $\mathbb{E} [\lVert\hat{\beta}-\beta^{\star}\rVert^2]$ the decision maker is actually minimizing the trace of the covariance matrix of the problem, which corresponds then to online A-optimal design.

regression

ROI Maximization in Stochastic Online Decision-Making

no code implementations NeurIPS 2021 Nicolò Cesa-Bianchi, Tommaso Cesari, Yishay Mansour, Vianney Perchet

We introduce a novel theoretical framework for Return On Investment (ROI) maximization in repeated decision-making.

Decision Making

Utility/Privacy Trade-off through the lens of Optimal Transport

1 code implementation27 May 2019 Etienne Boursier, Vianney Perchet

Strategic information is valuable either by remaining private (for instance if it is sensitive) or, on the other hand, by being used publicly to increase some utility.

An adaptive stochastic optimization algorithm for resource allocation

no code implementations12 Feb 2019 Xavier Fontaine, Shie Mannor, Vianney Perchet

This can be recast as a specific stochastic optimization problem where the objective is to maximize the cumulative reward, or equivalently to minimize the regret.

Stochastic Optimization

Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits

no code implementations11 Feb 2019 Pierre Perrault, Vianney Perchet, Michal Valko

We improve the efficiency of algorithms for stochastic \emph{combinatorial semi-bandits}.

A Practical Algorithm for Multiplayer Bandits when Arm Means Vary Among Players

no code implementations4 Feb 2019 Etienne Boursier, Emilie Kaufmann, Abbas Mehrabian, Vianney Perchet

We study a multiplayer stochastic multi-armed bandit problem in which players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward.

Open-Ended Question Answering

Regularized Contextual Bandits

no code implementations11 Oct 2018 Xavier Fontaine, Quentin Berthet, Vianney Perchet

We consider the stochastic contextual bandit problem with additional regularization.

Multi-Armed Bandits

Bridging the gap between regret minimization and best arm identification, with application to A/B tests

no code implementations9 Oct 2018 Rémy Degenne, Thomas Nedelec, Clément Calauzènes, Vianney Perchet

State of the art online learning procedures focus either on selecting the best alternative ("best arm identification") or on minimizing the cost (the "regret").

SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits

1 code implementation NeurIPS 2019 Etienne Boursier, Vianney Perchet

Motivated by cognitive radio networks, we consider the stochastic multiplayer multi-armed bandit problem, where several players pull arms simultaneously and collisions occur if one of them is pulled by several players at the same stage.

Multi-Armed Bandits

Bandits with Side Observations: Bounded vs. Logarithmic Regret

no code implementations10 Jul 2018 Rémy Degenne, Evrard Garcelon, Vianney Perchet

We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $\epsilon$, an extra observation is gathered by the agent for free.

Dynamic Pricing with Finitely Many Unknown Valuations

no code implementations9 Jul 2018 Nicolò Cesa-Bianchi, Tommaso Cesari, Vianney Perchet

When $K=2$ in the distribution-dependent case, the hardness of our setting reduces to that of a stochastic $2$-armed bandit: we prove that an upper bound of order $(\log T)/\Delta$ (up to $\log\log$ factors) on the regret can be achieved with no information on the demand curve.

Finding the bandit in a graph: Sequential search-and-stop

no code implementations6 Jun 2018 Pierre Perrault, Vianney Perchet, Michal Valko

We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution.

Multi-Armed Bandits

Stochastic Bandit Models for Delayed Conversions

no code implementations28 Jun 2017 Claire Vernade, Olivier Cappé, Vianney Perchet

We assume that the probability of conversion associated with each action is unknown while the distribution of the conversion delay is known, distinguishing between the (idealized) case where the conversion events may be observed whatever their delay and the more realistic setting in which late conversions are censored.

Product Recommendation

Sparse Stochastic Bandits

no code implementations5 Jun 2017 Joon Kwon, Vianney Perchet, Claire Vernade

In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward.

A comparative study of counterfactual estimators

no code implementations3 Apr 2017 Thomas Nedelec, Nicolas Le Roux, Vianney Perchet

We provide a comparative study of several widely used off-policy estimators (Empirical Average, Basic Importance Sampling and Normalized Importance Sampling), detailing the different regimes where they are individually suboptimal.

counterfactual

Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe

no code implementations NeurIPS 2017 Quentin Berthet, Vianney Perchet

We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback.

BIG-bench Machine Learning Stochastic Optimization

Combinatorial semi-bandit with known covariance

no code implementations NeurIPS 2016 Rémy Degenne, Vianney Perchet

We introduce a way to quantify the dependency structure of the problem and design an algorithm that adapts to it.

Approachability of convex sets in generalized quitting games

no code implementations28 Sep 2016 János Flesch, Rida Laraki, Vianney Perchet

The third is necessary: if it is not satisfied, the opponent can weakly exclude the target set.

Highly-Smooth Zero-th Order Online Optimization Vianney Perchet

no code implementations26 May 2016 Francis Bach, Vianney Perchet

The minimization of convex functions which are only available through partial and noisy information is a key methodological problem in many disciplines.

Gains and Losses are Fundamentally Different in Regret Minimization: The Sparse Case

no code implementations26 Nov 2015 Joon Kwon, Vianney Perchet

We demonstrate that, in the classical non-stochastic regret minimization problem with $d$ decisions, gains and losses to be respectively maximized or minimized are fundamentally different.

Online learning in repeated auctions

no code implementations18 Nov 2015 Jonathan Weed, Vianney Perchet, Philippe Rigollet

To our knowledge, this is the first complete set of strategies for bidders participating in auctions of this type.

Approachability in unknown games: Online learning meets multi-objective optimization

no code implementations10 Feb 2014 Shie Mannor, Vianney Perchet, Gilles Stoltz

We show that it is impossible, in general, to approach the best target set in hindsight and propose achievable though ambitious alternative goals.

Gaussian Process Optimization with Mutual Information

no code implementations19 Nov 2013 Emile Contal, Vianney Perchet, Nicolas Vayatis

In this paper, we analyze a generic algorithm scheme for sequential global optimization using Gaussian processes.

Gaussian Processes

A Primal Condition for Approachability with Partial Monitoring

no code implementations23 May 2013 Shie Mannor, Vianney Perchet, Gilles Stoltz

In this paper we provide primal conditions on a convex set to be approachable with partial monitoring.

The multi-armed bandit problem with covariates

no code implementations27 Oct 2011 Vianney Perchet, Philippe Rigollet

We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate.

Cannot find the paper you are looking for? You can Submit a new open access paper.