Search Results for author: Pierre Menard

Found 10 papers, 5 papers with code

Demonstration-Regularized RL

no code implementations26 Oct 2023 Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning.

reinforcement-learning Reinforcement Learning (RL)

Fast Rates for Maximum Entropy Exploration

1 code implementation14 Mar 2023 Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to $\widetilde{\mathcal{O}}(H^2SA/\varepsilon^2)$, yielding a statistical separation between maximum entropy exploration and reward-free exploration.

Reinforcement Learning (RL)

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

1 code implementation28 Sep 2022 Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions.

reinforcement-learning Reinforcement Learning (RL)

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

no code implementations16 May 2022 Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard

We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits.

Multi-Armed Bandits

UCB Momentum Q-learning: Correcting the bias without forgetting

1 code implementation1 Mar 2021 Pierre Menard, Omar Darwiche Domingues, Xuedong Shang, Michal Valko

We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algorithm for reinforcement learning in tabular and possibly stage-dependent, episodic Markov decision process.

Q-Learning

The Influence of Shape Constraints on the Thresholding Bandit Problem

no code implementations17 Jun 2020 James Cheshire, Pierre Menard, Alexandra Carpentier

We prove that the minimax rates for the regret are (i) $\sqrt{\log(K)K/T}$ for TBP, (ii) $\sqrt{\log(K)/T}$ for MTBP, (iii) $\sqrt{K/T}$ for UTBP and (iv) $\sqrt{\log\log K/T}$ for CTBP, where $K$ is the number of arms and $T$ is the budget.

Planning in entropy-regularized Markov decision processes and games

1 code implementation NeurIPS 2019 Jean-bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser.

KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

1 code implementation14 May 2018 Aurélien Garivier, Hédi Hadiji, Pierre Menard, Gilles Stoltz

We were able to obtain this non-parametric bi-optimality result while working hard to streamline the proofs (of previously known regret bounds and thus of the new analyses carried out); a second merit of the present contribution is therefore to provide a review of proofs of classical regret bounds for index-based strategies for $K$-armed stochastic bandits.

Thresholding Bandit for Dose-ranging: The Impact of Monotonicity

no code implementations13 Nov 2017 Aurélien Garivier, Pierre Ménard, Laurent Rossi, Pierre Menard

We analyze the sample complexity of the thresholding bandit problem, with and without the assumption that the mean values of the arms are increasing.

valid

Cannot find the paper you are looking for? You can Submit a new open access paper.