Search Results for author: Claire Vernade

Found 23 papers, 5 papers with code

Non-Stationary Bandits with Intermediate Observations

no code implementations • ICML 2020 • Claire Vernade, András György, Timothy Mann

In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about the environment, since the available observations are already obsolete.

Recommendation Systems

Paper
Add Code

Prior-Dependent Allocations for Bayesian Fixed-Budget Best-Arm Identification in Structured Bandits

no code implementations • 8 Feb 2024 • Nicolas Nguyen, Imad Aouali, András György, Claire Vernade

We study the problem of Bayesian fixed-budget best-arm identification (BAI) in structured bandits.

Paper
Add Code

Beyond Average Return in Markov Decision Processes

no code implementations • NeurIPS 2023 • Alexandre Marthe, Aurélien Garivier, Claire Vernade

What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics.

Distributional Reinforcement Learning

Paper
Add Code

POMRL: No-Regret Learning-to-Plan with Increasing Horizons

no code implementations • 30 Dec 2022 • Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy

We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task.

Meta Reinforcement Learning Reinforcement Learning (RL)

Paper
Add Code

Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

1 code implementation • 25 Feb 2022 • MohammadJavad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, Mohammad Ghavamzadeh

We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks.

Meta-Learning

Paper
Code

EigenGame Unloaded: When playing games is better than optimizing

1 code implementation • ICLR 2022 • Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel

We build on the recently proposed EigenGame that views eigendecomposition as a competitive game.

Clustering

Paper
Code

Asymptotically Optimal Information-Directed Sampling

no code implementations • 11 Nov 2020 • Johannes Kirschner, Tor Lattimore, Claire Vernade, Csaba Szepesvári

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time.

Paper
Add Code

The Elliptical Potential Lemma Revisited

no code implementations • 20 Oct 2020 • Alexandra Carpentier, Claire Vernade, Yasin Abbasi-Yadkori

This note proposes a new proof and new perspectives on the so-called Elliptical Potential Lemma.

LEMMA

Paper
Add Code

EigenGame: PCA as a Nash Equilibrium

2 code implementations • ICLR 2021 • Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel

We present a novel view on principal component analysis (PCA) as a competitive game in which each approximate eigenvector is controlled by a player whose goal is to maximize their own utility function.

Paper
Code

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting

1 code implementation • 18 Jun 2020 • Ilja Kuzborskij, Claire Vernade, András György, Csaba Szepesvári

We consider off-policy evaluation in the contextual bandit setting for the purpose of obtaining a robust off-policy selection strategy, where the selection strategy is evaluated based on the value of the chosen policy in a set of proposal (target) policies.

Multi-Armed Bandits Off-policy evaluation

Paper
Code

Stochastic bandits with arm-dependent delays

no code implementations • ICML 2020 • Anne Gael Manegueu, Claire Vernade, Alexandra Carpentier, Michal Valko

Significant work has been recently dedicated to the stochastic delayed bandit setting because of its relevance in applications.

Paper
Add Code

Non-Stationary Delayed Bandits with Intermediate Observations

no code implementations • 3 Jun 2020 • Claire Vernade, Andras Gyorgy, Timothy Mann

In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about the environment, since the available observations are already obsolete.

Recommendation Systems

Paper
Add Code

Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

no code implementations • 6 Dec 2019 • Cindy Trinh, Emilie Kaufmann, Claire Vernade, Richard Combes

Stochastic Rank-One Bandits (Katarya et al, (2017a, b)) are a simple framework for regret minimization problems over rank-one matrices of arms.

Thompson Sampling

Paper
Add Code

Weighted Linear Bandits for Non-Stationary Environments

1 code implementation • NeurIPS 2019 • Yoan Russac, Claire Vernade, Olivier Cappé

To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past.

regression

Paper
Code

Linear Bandits with Stochastic Delayed Feedback

no code implementations • ICML 2020 • Claire Vernade, Alexandra Carpentier, Tor Lattimore, Giovanni Zappella, Beyza Ermis, Michael Brueckner

Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as online marketing and recommendation.

Marketing Multi-Armed Bandits

Paper
Add Code

Max K-armed bandit: On the ExtremeHunter algorithm and beyond

no code implementations • 27 Jul 2017 • Mastane Achab, Stephan Clémençon, Aurélien Garivier, Anne Sabourin, Claire Vernade

This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values.

Paper
Add Code

Stochastic Bandit Models for Delayed Conversions

no code implementations • 28 Jun 2017 • Claire Vernade, Olivier Cappé, Vianney Perchet

We assume that the probability of conversion associated with each action is unknown while the distribution of the conversion delay is known, distinguishing between the (idealized) case where the conversion events may be observed whatever their delay and the more realistic setting in which late conversions are censored.

Product Recommendation

Paper
Add Code

Sparse Stochastic Bandits

no code implementations • 5 Jun 2017 • Joon Kwon, Vianney Perchet, Claire Vernade

In the classical multi-armed bandit problem, d arms are available to the decision maker who pulls them sequentially in order to maximize his cumulative reward.

Paper
Add Code

Bernoulli Rank-$1$ Bandits for Click Feedback

no code implementations • 19 Mar 2017 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Claire Vernade, Zheng Wen

The probability that a user will click a search result depends both on its relevance and its position on the results page.

Position

Paper
Add Code

Stochastic Rank-1 Bandits

no code implementations • 10 Aug 2016 • Sumeet Katariya, Branislav Kveton, Csaba Szepesvari, Claire Vernade, Zheng Wen

The main challenge of the problem is that the individual values of the row and column are unobserved.

Paper
Add Code

Multiple-Play Bandits in the Position-Based Model

no code implementations • NeurIPS 2016 • Paul Lagrée, Claire Vernade, Olivier Cappé

Sequentially learning to place items in multi-position displays or lists is a task that can be cast into the multiple-play semi-bandit setting.

Position

Paper
Add Code

Sequential ranking under random semi-bandit feedback

no code implementations • 4 Mar 2016 • Hossein Vahabi, Paul Lagrée, Claire Vernade, Olivier Cappé

In many web applications, a recommendation is not a single item suggested to a user but a list of possibly interesting contents that may be ranked in some contexts.

Paper
Add Code

Learning From Missing Data Using Selection Bias in Movie Recommendation

no code implementations • 30 Sep 2015 • Claire Vernade, Olivier Cappé

Recommending items to users is a challenging task due to the large amount of missing information.

Collaborative Filtering Movie Recommendation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.