Search Results for author: Claire Vernade

Found 23 papers, 5 papers with code

Non-Stationary Bandits with Intermediate Observations

no code implementations ICML 2020 Claire Vernade, András György, Timothy Mann

In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about the environment, since the available observations are already obsolete.

Recommendation Systems

Beyond Average Return in Markov Decision Processes

no code implementations NeurIPS 2023 Alexandre Marthe, Aurélien Garivier, Claire Vernade

What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics.

Distributional Reinforcement Learning

POMRL: No-Regret Learning-to-Plan with Increasing Horizons

no code implementations30 Dec 2022 Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy

We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task.

Meta Reinforcement Learning Reinforcement Learning (RL)

Asymptotically Optimal Information-Directed Sampling

no code implementations11 Nov 2020 Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time.

The Elliptical Potential Lemma Revisited

no code implementations20 Oct 2020 Alexandra Carpentier, Claire Vernade, Yasin Abbasi-Yadkori

This note proposes a new proof and new perspectives on the so-called Elliptical Potential Lemma.

LEMMA

EigenGame: PCA as a Nash Equilibrium

2 code implementations ICLR 2021 Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel

We present a novel view on principal component analysis (PCA) as a competitive game in which each approximate eigenvector is controlled by a player whose goal is to maximize their own utility function.

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

1 code implementation18 Jun 2020 Ilja Kuzborskij, Claire Vernade, András György, Csaba Szepesvári

We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies.

Multi-Armed Bandits Off-policy evaluation

Stochastic bandits with arm-dependent delays

no code implementations ICML 2020 Anne Gael Manegueu, Claire Vernade, Alexandra Carpentier, Michal Valko

Significant work has been recently dedicated to the stochastic delayed bandit setting because of its relevance in applications.

Non-Stationary Delayed Bandits with Intermediate Observations

no code implementations3 Jun 2020 Claire Vernade, Andras Gyorgy, Timothy Mann

In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about the environment, since the available observations are already obsolete.

Recommendation Systems

Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

no code implementations6 Dec 2019 Cindy Trinh, Emilie Kaufmann, Claire Vernade, Richard Combes

Stochastic Rank-One Bandits (Katarya et al, (2017a, b)) are a simple framework for regret minimization problems over rank-one matrices of arms.

Thompson Sampling

Weighted Linear Bandits for Non-Stationary Environments

1 code implementation NeurIPS 2019 Yoan Russac, Claire Vernade, Olivier Cappé

To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past.

regression

Linear Bandits with Stochastic Delayed Feedback

no code implementations ICML 2020 Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner

Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation.

Marketing Multi-Armed Bandits

Max K-armed bandit: On the ExtremeHunter algorithm and beyond

no code implementations27 Jul 2017 Mastane Achab, Stephan Clémençon, Aurélien Garivier, Anne Sabourin, Claire Vernade

This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values.

Stochastic Bandit Models for Delayed Conversions

no code implementations28 Jun 2017 Claire Vernade, Olivier Cappé, Vianney Perchet

We assume that the probability of conversion associated with each action is unknown while the distribution of the conversion delay is known, distinguishing between the (idealized) case where the conversion events may be observed whatever their delay and the more realistic setting in which late conversions are censored.

Product Recommendation

Sparse Stochastic Bandits

no code implementations5 Jun 2017 Joon Kwon, Vianney Perchet, Claire Vernade

In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward.

Bernoulli Rank-$1$ Bandits for Click Feedback

no code implementations19 Mar 2017 Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen

The probability that a user will click a search result depends both on its relevance and its position on the results page.

Position

Stochastic Rank-1 Bandits

no code implementations10 Aug 2016 Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, Claire Vernade, Zheng Wen

The main challenge of the problem is that the individual values of the row and column are unobserved.

Multiple-Play Bandits in the Position-Based Model

no code implementations NeurIPS 2016 Paul Lagrée, Claire Vernade, Olivier Cappé

Sequentially learning to place items in multi-position displays or lists is a task that can be cast into the multiple-play semi-bandit setting.

Position

Sequential ranking under random semi-bandit feedback

no code implementations4 Mar 2016 Hossein Vahabi, Paul Lagrée, Claire Vernade, Olivier Cappé

In many web applications, a recommendation is not a single item suggested to a user but a list of possibly interesting contents that may be ranked in some contexts.

Cannot find the paper you are looking for? You can Submit a new open access paper.