Search Results for author: Evrard Garcelon

Found 13 papers, 0 papers with code

Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation

no code implementations24 Dec 2023 Paul Daoudi, Mathias Formoso, Othman Gaizi, Achraf Azize, Evrard Garcelon

A precondition for the deployment of a Reinforcement Learning agent to a real-world system is to provide guarantees on the learning process.

SALSA PICANTE: a machine learning attack on LWE with binary secrets

no code implementations7 Mar 2023 Cathy Li, Jana Sotáková, Emily Wenger, Mohamed Malhou, Evrard Garcelon, Francois Charton, Kristin Lauter

However, this attack assumes access to millions of eavesdropped LWE samples and fails at higher Hamming weights or dimensions.

Math

Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

no code implementations13 Dec 2021 Evrard Garcelon, Vashist Avadhanula, Alessandro Lazaric, Matteo Pirotta

We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, \emph{evaluations} of the true reward of each arm and it selects $K$ arms with the objective of accumulating as much reward as possible over $T$ rounds.

Privacy Amplification via Shuffling for Linear Contextual Bandits

no code implementations11 Dec 2021 Evrard Garcelon, Kamalika Chaudhuri, Vianney Perchet, Matteo Pirotta

Contextual bandit algorithms are widely used in domains where it is desirable to provide a personalized service by leveraging contextual information, that may contain sensitive information that needs to be protected.

Multi-Armed Bandits

Differentially Private Exploration in Reinforcement Learning with Linear Representation

no code implementations2 Dec 2021 Paul Luyo, Evrard Garcelon, Alessandro Lazaric, Matteo Pirotta

We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a. k. a.\ model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration.

Privacy Preserving reinforcement-learning +1

Encrypted Linear Contextual Bandit

no code implementations17 Mar 2021 Evrard Garcelon, Vianney Perchet, Matteo Pirotta

A critical aspect of bandit methods is that they require to observe the contexts --i. e., individual or group-level data-- and rewards in order to solve the sequential problem.

Decision Making Multi-Armed Bandits +2

Local Differential Privacy for Regret Minimization in Reinforcement Learning

no code implementations NeurIPS 2021 Evrard Garcelon, Vianney Perchet, Ciara Pike-Burke, Matteo Pirotta

Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user side.

reinforcement-learning Reinforcement Learning (RL)

Improved Algorithms for Conservative Exploration in Bandits

no code implementations8 Feb 2020 Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

In this case, it is desirable to deploy online learning algorithms (e. g., a multi-armed bandit algorithm) that interact with the system to learn a better/optimal policy under the constraint that during the learning process the performance is almost never worse than the performance of the baseline itself.

Marketing Recommendation Systems

Conservative Exploration in Reinforcement Learning

no code implementations8 Feb 2020 Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current knowledge to maximize the reward.

reinforcement-learning Reinforcement Learning (RL)

No-Regret Exploration in Goal-Oriented Reinforcement Learning

no code implementations ICML 2020 Jean Tarbouriech, Evrard Garcelon, Michal Valko, Matteo Pirotta, Alessandro Lazaric

Many popular reinforcement learning problems (e. g., navigation in a maze, some Atari games, mountain car) are instances of the episodic setting under its stochastic shortest path (SSP) formulation, where an agent has to achieve a goal state while minimizing the cumulative cost.

Atari Games reinforcement-learning +1

Bandits with Side Observations: Bounded vs. Logarithmic Regret

no code implementations10 Jul 2018 Rémy Degenne, Evrard Garcelon, Vianney Perchet

We consider the classical stochastic multi-armed bandit but where, from time to time and roughly with frequency $\epsilon$, an extra observation is gathered by the agent for free.

Cannot find the paper you are looking for? You can Submit a new open access paper.