Search Results for author: Pierre Ménard

Found 26 papers, 3 papers with code

Local and adaptive mirror descents in extensive-form games

no code implementations • 1 Sep 2023 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

We study how to learn $\epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback.

Paper
Add Code

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.

regression Reinforcement Learning (RL)

Paper
Code

Learning Generative Models with Goal-conditioned Reinforcement Learning

no code implementations • 26 Mar 2023 • Mariana Vargas Vieyra, Pierre Ménard

We present a novel, alternative framework for learning generative models with goal-conditioned reinforcement learning.

Image Generation reinforcement-learning

Paper
Add Code

Adapting to game trees in zero-sum imperfect information games

1 code implementation • 23 Dec 2022 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

Imperfect information games (IIG) are games in which each player only partially observes the current game state.

Paper
Code

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Indexed Minimum Empirical Divergence for Unimodal Bandits

no code implementations • NeurIPS 2021 • Hassan Saber, Pierre Ménard, Odalric-Ambrym Maillard

We consider a multi-armed bandit problem specified by a set of one-dimensional family exponential distributions endowed with a unimodal structure.

Paper
Add Code

Learning in two-player zero-sum partially observable Markov games with perfect recall

no code implementations • NeurIPS 2021 • Tadashi Kozuno, Pierre Ménard, Remi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an extensive game with imperfect information (EGII) through self-play.

Paper
Add Code

Adaptive Multi-Goal Exploration

no code implementations • 23 Nov 2021 • Jean Tarbouriech, Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We introduce a generic strategy for provably efficient multi-goal exploration.

Paper
Add Code

Problem Dependent View on Structured Thresholding Bandit Problems

no code implementations • 18 Jun 2021 • James Cheshire, Pierre Ménard, Alexandra Carpentier

Taking $K$ as the number of arms, we consider the case where (i) the sequence of arm's means $(\mu_k)_{k=1}^K$ is monotonically increasing (MTBP) and (ii) the case where $(\mu_k)_{k=1}^K$ is concave (CTBP).

Paper
Add Code

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

no code implementations • 11 Jun 2021 • Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play.

Paper
Add Code

Bandits with many optimal arms

no code implementations • NeurIPS 2021 • Rianne de Heide, James Cheshire, Pierre Ménard, Alexandra Carpentier

We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters $T$ (the budget), $p^*$ and $\Delta$.

Paper
Add Code

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

no code implementations • 7 Oct 2020 • Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann, Michal Valko

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Fast active learning for pure exploration in reinforcement learning

no code implementations • 27 Jul 2020 • Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Emilie Kaufmann, Edouard Leurent, Michal Valko

Realistic environments often provide agents with very limited feedback.

Active Learning reinforcement-learning +1

Paper
Add Code

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

no code implementations • 9 Jul 2020 • Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Optimal Strategies for Graph-Structured Bandits

no code implementations • 7 Jul 2020 • Hassan Saber, Pierre Ménard, Odalric-Ambrym Maillard

[0, 1]^{\mathcal{A}\times\mathcal{B}}$ and by a given weight matrix $\omega\!=\!

Paper
Add Code

Gamification of Pure Exploration for Linear Bandits

no code implementations • ICML 2020 • Rémy Degenne, Pierre Ménard, Xuedong Shang, Michal Valko

We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits.

Experimental Design

Paper
Add Code

Forced-exploration free Strategies for Unimodal Bandits

no code implementations • 30 Jun 2020 • Hassan Saber, Pierre Ménard, Odalric-Ambrym Maillard

This strategy is proven optimal.

Paper
Add Code

Adaptive Reward-Free Exploration

no code implementations • 11 Jun 2020 • Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko

Reward-free exploration is a reinforcement learning setting studied by Jin et al. (2020), who address it by running several algorithms with regret guarantees in parallel.

Paper
Add Code

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

no code implementations • NeurIPS 2020 • Anders Jonsson, Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support.

Paper
Add Code

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

1 code implementation • 12 Apr 2020 • Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Fixed-Confidence Guarantees for Bayesian Best-Arm Identification

no code implementations • 24 Oct 2019 • Xuedong Shang, Rianne de Heide, Emilie Kaufmann, Pierre Ménard, Michal Valko

We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS).

Thompson Sampling

Paper
Add Code

Non-Asymptotic Pure Exploration by Solving Games

no code implementations • NeurIPS 2019 • Rémy Degenne, Wouter M. Koolen, Pierre Ménard

Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment.

Paper
Add Code

Gradient Ascent for Active Exploration in Bandit Problems

no code implementations • 20 May 2019 • Pierre Ménard

We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting.

Paper
Add Code

Thresholding Bandit for Dose-ranging: The Impact of Monotonicity

no code implementations • 13 Nov 2017 • Aurélien Garivier, Pierre Ménard, Laurent Rossi, Pierre Menard

We analyze the sample complexity of the thresholding bandit problem, with and without the assumption that the mean values of the arms are increasing.

valid

Paper
Add Code

A minimax and asymptotically optimal algorithm for stochastic bandits

no code implementations • 23 Feb 2017 • Pierre Ménard, Aurélien Garivier

We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions.

Paper
Add Code

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

no code implementations • 23 Feb 2016 • Aurélien Garivier, Pierre Ménard, Gilles Stoltz

We revisit lower bounds on the regret in the case of multi-armed bandit problems.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.