Search Results for author: Pierre Ménard

Found 26 papers, 3 papers with code

Local and adaptive mirror descents in extensive-form games

no code implementations1 Sep 2023 Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

We study how to learn $\epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback.

Learning Generative Models with Goal-conditioned Reinforcement Learning

no code implementations26 Mar 2023 Mariana Vargas Vieyra, Pierre Ménard

We present a novel, alternative framework for learning generative models with goal-conditioned reinforcement learning.

Image Generation reinforcement-learning

Adapting to game trees in zero-sum imperfect information games

1 code implementation23 Dec 2022 Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

Imperfect information games (IIG) are games in which each player only partially observes the current game state.

Indexed Minimum Empirical Divergence for Unimodal Bandits

no code implementations NeurIPS 2021 Hassan Saber, Pierre Ménard, Odalric-Ambrym Maillard

We consider a multi-armed bandit problem specified by a set of one-dimensional family exponential distributions endowed with a unimodal structure.

Learning in two-player zero-sum partially observable Markov games with perfect recall

no code implementations NeurIPS 2021 Tadashi Kozuno, Pierre Ménard, Remi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an extensive game with imperfect information (EGII) through self-play.

Adaptive Multi-Goal Exploration

no code implementations23 Nov 2021 Jean Tarbouriech, Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We introduce a generic strategy for provably efficient multi-goal exploration.

Problem Dependent View on Structured Thresholding Bandit Problems

no code implementations18 Jun 2021 James Cheshire, Pierre Ménard, Alexandra Carpentier

Taking $K$ as the number of arms, we consider the case where (i) the sequence of arm's means $(\mu_k)_{k=1}^K$ is monotonically increasing (MTBP) and (ii) the case where $(\mu_k)_{k=1}^K$ is concave (CTBP).

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

no code implementations11 Jun 2021 Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play.

Bandits with many optimal arms

no code implementations NeurIPS 2021 Rianne de Heide, James Cheshire, Pierre Ménard, Alexandra Carpentier

We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters $T$ (the budget), $p^*$ and $\Delta$.

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

no code implementations7 Oct 2020 Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann, Michal Valko

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode.

reinforcement-learning Reinforcement Learning (RL)

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

no code implementations9 Jul 2020 Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric.

reinforcement-learning Reinforcement Learning (RL)

Optimal Strategies for Graph-Structured Bandits

no code implementations7 Jul 2020 Hassan Saber, Pierre Ménard, Odalric-Ambrym Maillard

[0, 1]^{\mathcal{A}\times\mathcal{B}}$ and by a given weight matrix $\omega\!=\!

Gamification of Pure Exploration for Linear Bandits

no code implementations ICML 2020 Rémy Degenne, Pierre Ménard, Xuedong Shang, Michal Valko

We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits.

Experimental Design

Adaptive Reward-Free Exploration

no code implementations11 Jun 2020 Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko

Reward-free exploration is a reinforcement learning setting studied by Jin et al. (2020), who address it by running several algorithms with regret guarantees in parallel.

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

no code implementations NeurIPS 2020 Anders Jonsson, Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support.

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

1 code implementation12 Apr 2020 Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric.

reinforcement-learning Reinforcement Learning (RL)

Fixed-Confidence Guarantees for Bayesian Best-Arm Identification

no code implementations24 Oct 2019 Xuedong Shang, Rianne de Heide, Emilie Kaufmann, Pierre Ménard, Michal Valko

We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS).

Thompson Sampling

Non-Asymptotic Pure Exploration by Solving Games

no code implementations NeurIPS 2019 Rémy Degenne, Wouter M. Koolen, Pierre Ménard

Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment.

Gradient Ascent for Active Exploration in Bandit Problems

no code implementations20 May 2019 Pierre Ménard

We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting.

Thresholding Bandit for Dose-ranging: The Impact of Monotonicity

no code implementations13 Nov 2017 Aurélien Garivier, Pierre Ménard, Laurent Rossi, Pierre Menard

We analyze the sample complexity of the thresholding bandit problem, with and without the assumption that the mean values of the arms are increasing.

valid

A minimax and asymptotically optimal algorithm for stochastic bandits

no code implementations23 Feb 2017 Pierre Ménard, Aurélien Garivier

We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions.

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

no code implementations23 Feb 2016 Aurélien Garivier, Pierre Ménard, Gilles Stoltz

We revisit lower bounds on the regret in the case of multi-armed bandit problems.

Cannot find the paper you are looking for? You can Submit a new open access paper.