Search Results for author: Rémi Munos

Found 77 papers, 19 papers with code

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

no code implementations • 12 Feb 2024 • Mark Rowland, Li Kevin Wenliang, Rémi Munos, Clare Lyle, Yunhao Tang, Will Dabney

We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions with a generative model (up to logarithmic factors), resolving an open question of Zhang et al. (2023).

Distributional Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Generalized Preference Optimization: A Unified Approach to Offline Alignment

no code implementations • 8 Feb 2024 • Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices.

Paper
Add Code

Off-policy Distributional Q($λ$): Distributional RL without Importance Sampling

no code implementations • 8 Feb 2024 • Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney

We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms.

Paper
Add Code

Nash Learning from Human Feedback

no code implementations • 1 Dec 2023 • Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

We term this approach Nash learning from human feedback (NLHF).

Text Summarization

Paper
Add Code

A General Theoretical Paradigm to Understand Learning from Human Preferences

1 code implementation • 18 Oct 2023 • Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos

In particular we derive a new general objective called $\Psi$PO for learning from human preferences that is expressed in terms of pairwise preferences and therefore bypasses both approximations.

1,610

Paper
Code

Local and adaptive mirror descents in extensive-form games

no code implementations • 1 Sep 2023 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

We study how to learn $\epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback.

Paper
Add Code

VA-learning as a more efficient alternative to Q-learning

no code implementations • 29 May 2023 • Yunhao Tang, Rémi Munos, Mark Rowland, Michal Valko

In reinforcement learning, the advantage function is critical for policy improvement, but is often extracted from a learned Q-function.

Q-Learning

Paper
Add Code

Towards a Better Understanding of Representation Dynamics under TD-learning

no code implementations • 29 May 2023 • Yunhao Tang, Rémi Munos

Complementary to prior work, we provide a set of analysis that sheds further light on the representation dynamics under TD-learning.

Reinforcement Learning (RL) Representation Learning +1

Paper
Add Code

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

no code implementations • 29 May 2023 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko

Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings.

Paper
Add Code

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

no code implementations • 28 May 2023 • Mark Rowland, Yunhao Tang, Clare Lyle, Rémi Munos, Marc G. Bellemare, Will Dabney

We study the problem of temporal-difference-based policy evaluation in reinforcement learning.

Distributional Reinforcement Learning reinforcement-learning

Paper
Add Code

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.

regression Reinforcement Learning (RL)

Paper
Code

An Analysis of Quantile Temporal-Difference Learning

no code implementations • 11 Jan 2023 • Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney

We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning.

Distributional Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Adapting to game trees in zero-sum imperfect information games

1 code implementation • 23 Dec 2022 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

Imperfect information games (IIG) are games in which each player only partially observes the current game state.

Paper
Code

Understanding Self-Predictive Learning for Reinforcement Learning

no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments

no code implementations • 18 Nov 2022 • Daniel Jarrett, Corentin Tallec, Florent Altché, Thomas Mesnard, Rémi Munos, Michal Valko

In this work, we study a natural solution derived from structural causal models of the world: Our key idea is to learn representations of the future that capture precisely the unpredictable aspects of each outcome -- which we use as additional input for predictions, such that intrinsic rewards only reflect the predictable aspects of world dynamics.

Montezuma's Revenge

Paper
Add Code

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning

no code implementations • 15 Jul 2022 • Yunhao Tang, Mark Rowland, Rémi Munos, Bernardo Ávila Pires, Will Dabney, Marc G. Bellemare

We study the multi-step off-policy learning approach to distributional RL.

Distributional Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Generalised Policy Improvement with Geometric Policy Composition

no code implementations • 17 Jun 2022 • Shantanu Thakoor, Mark Rowland, Diana Borsa, Will Dabney, Rémi Munos, André Barreto

We introduce a method for policy improvement that interpolates between the greedy approach of value-based reinforcement learning (RL) and the full planning approach typical of model-based RL.

Continuous Control Reinforcement Learning (RL)

Paper
Add Code

BYOL-Explore: Exploration by Bootstrapped Prediction

no code implementations • 16 Jun 2022 • Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot

We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments.

Paper
Add Code

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Marginalized Operators for Off-policy Reinforcement Learning

no code implementations • 30 Mar 2022 • Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko

We show that the estimates for marginalized operators can be computed in a scalable way, which also generalizes prior results on marginalized importance sampling as special cases.

Off-policy evaluation reinforcement-learning

Paper
Add Code

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

1 code implementation • NeurIPS 2021 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko

Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions.

Meta Reinforcement Learning Off-policy evaluation +1

Paper
Code

Taylor Expansion of Discount Factors

no code implementations • 11 Jun 2021 • Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

no code implementations • 11 Jun 2021 • Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play.

Paper
Add Code

Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint

no code implementations • 7 Jun 2021 • Matthieu Geist, Julien Pérolat, Mathieu Laurière, Romuald Elie, Sarah Perrin, Olivier Bachem, Rémi Munos, Olivier Pietquin

Mean-field Games (MFGs) are a continuous approximation of many-agent RL.

Imitation Learning reinforcement-learning +1

Paper
Add Code

Revisiting Peng's Q($λ$) for Modern Reinforcement Learning

no code implementations • 27 Feb 2021 • Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.

Continuous Control reinforcement-learning +1

Paper
Add Code

Large-Scale Representation Learning on Graphs via Bootstrapping

3 code implementations • ICLR 2022 • Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L. Dyer, Rémi Munos, Petar Veličković, Michal Valko

To address these challenges, we introduce Bootstrapped Graph Latents (BGRL) - a graph representation learning method that learns by predicting alternative augmentations of the input.

Contrastive Learning Graph Representation Learning +1

Paper
Code

Geometric Entropic Exploration

no code implementations • 6 Jan 2021 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Avila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos

Exploration is essential for solving complex Reinforcement Learning (RL) tasks.

Reinforcement Learning (RL)

Paper
Add Code

Counterfactual Credit Assignment in Model-Free Reinforcement Learning

no code implementations • 18 Nov 2020 • Thomas Mesnard, Théophane Weber, Fabio Viola, Shantanu Thakoor, Alaa Saade, Anna Harutyunyan, Will Dabney, Tom Stepleton, Nicolas Heess, Arthur Guez, Éric Moulines, Marcus Hutter, Lars Buesing, Rémi Munos

Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards.

counterfactual reinforcement-learning +1

Paper
Add Code

The Advantage Regret-Matching Actor-Critic

no code implementations • 27 Aug 2020 • Audrūnas Gruslys, Marc Lanctot, Rémi Munos, Finbarr Timbers, Martin Schmid, Julien Perolat, Dustin Morrill, Vinicius Zambaldi, Jean-Baptiste Lespiau, John Schultz, Mohammad Gheshlaghi Azar, Michael Bowling, Karl Tuyls

In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior.

counterfactual Reinforcement Learning (RL)

Paper
Add Code

Monte-Carlo Tree Search as Regularized Policy Optimization

3 code implementations • ICML 2020 • Jean-bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Rémi Munos

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Bootstrap your own latent: A new approach to self-supervised Learning

31 code implementations • 13 Jun 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.

Ranked #2 on Self-Supervised Person Re-Identification on SYSU-30k

Representation Learning Self-Supervised Image Classification +3

12,788

Paper
Code

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning

no code implementations • ICML 2020 • Daniel Guo, Bernardo Avila Pires, Bilal Piot, Jean-bastien Grill, Florent Altché, Rémi Munos, Mohammad Gheshlaghi Azar

These latent embeddings are themselves trained to be predictive of the aforementioned representations.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Leverage the Average: an Analysis of KL Regularization in RL

no code implementations • 31 Mar 2020 • Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.

Reinforcement Learning (RL)

Paper
Add Code

Taylor Expansion Policy Optimization

no code implementations • ICML 2020 • Yunhao Tang, Michal Valko, Rémi Munos

In this work, we investigate the application of Taylor expansions in reinforcement learning.

Off-policy evaluation reinforcement-learning

Paper
Add Code

Conditional Importance Sampling for Off-Policy Learning

no code implementations • 16 Oct 2019 • Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney

We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Adaptive Trade-Offs in Off-Policy Learning

no code implementations • 16 Oct 2019 • Mark Rowland, Will Dabney, Rémi Munos

A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms.

Off-policy evaluation reinforcement-learning

Paper
Add Code

Towards Consistent Performance on Atari using Expert Demonstrations

no code implementations • ICLR 2019 • Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.

Atari Games Reinforcement Learning (RL)

Paper
Add Code

Statistics and Samples in Distributional Reinforcement Learning

no code implementations • 21 Feb 2019 • Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G. Bellemare, Will Dabney

We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution.

Distributional Reinforcement Learning reinforcement-learning +1

Paper
Add Code

World Discovery Models

1 code implementation • 20 Feb 2019 • Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo Avila Pires, Jean-bastien Grill, Florent Altché, Rémi Munos

As humans we are driven by a strong desire for seeking novelty in our world.

Paper
Code

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

no code implementations • ICML 2018 • André Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Žídek, Rémi Munos

In this paper we extend the SFs & GPI framework in two ways.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Optimistic optimization of a Brownian

no code implementations • NeurIPS 2018 • Jean-bastien Grill, Michal Valko, Rémi Munos

Given $W$, our goal is to return an $\epsilon$-approximation of its maximum using the smallest possible number of function evaluations, the sample complexity of the algorithm.

Paper
Add Code

Universal Successor Features Approximators

1 code implementation • ICLR 2019 • Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul

We focus on one aspect in particular, namely the ability to generalise to unseen tasks.

Navigate Reinforcement Learning (RL)

Paper
Code

Neural Predictive Belief Representations

no code implementations • 15 Nov 2018 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Rémi Munos

In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far.

Decision Making Representation Learning

Paper
Add Code

Implicit Quantile Networks for Distributional Reinforcement Learning

20 code implementations • ICML 2018 • Will Dabney, Georg Ostrovski, David Silver, Rémi Munos

In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN.

Ranked #1 on Atari Games on Atari 2600 Freeway

Atari Games Distributional Reinforcement Learning +3

10,367

Paper
Code

Autoregressive Quantile Networks for Generative Modeling

1 code implementation • ICML 2018 • Georg Ostrovski, Will Dabney, Rémi Munos

We introduce autoregressive implicit quantile networks (AIQN), a fundamentally different approach to generative modeling than those commonly used, that implicitly captures the distribution using quantile regression.

regression

Paper
Code

Observe and Look Further: Achieving Consistent Performance on Atari

no code implementations • 29 May 2018 • Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Montezuma's Revenge Reinforcement Learning (RL)

Paper
Add Code

An Analysis of Categorical Distributional Reinforcement Learning

no code implementations • 22 Feb 2018 • Mark Rowland, Marc G. Bellemare, Will Dabney, Rémi Munos, Yee Whye Teh

Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance.

Distributional Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Learning to Search with MCTSnets

2 code implementations • ICML 2018 • Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, David Silver

They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and back-up those evaluations to the root of a search tree.

Paper
Code

Distributional Reinforcement Learning with Quantile Regression

17 code implementations • 27 Oct 2017 • Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos

In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean.

Ranked #1 on Atari Games on Atari 2600 Pong

Atari Games Distributional Reinforcement Learning +3

7,893

Paper
Code

A Distributional Perspective on Reinforcement Learning

22 code implementations • ICML 2017 • Marc G. Bellemare, Will Dabney, Rémi Munos

We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning.

Ranked #4 on Atari Games on Atari 2600 HERO

Atari Games reinforcement-learning +1

3,521

Paper
Code

Observational Learning by Reinforcement Learning

no code implementations • 20 Jun 2017 • Diana Borsa, Bilal Piot, Rémi Munos, Olivier Pietquin

Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

The Cramer Distance as a Solution to Biased Wasserstein Gradients

2 code implementations • ICLR 2018 • Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos

We show that the Cram\'er distance possesses all three desired properties, combining the best of the Wasserstein and Kullback-Leibler divergences.

BIG-bench Machine Learning Generative Adversarial Network

Paper
Code

Minimax Regret Bounds for Reinforcement Learning

1 code implementation • ICML 2017 • Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos

We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Successor Features for Transfer in Reinforcement Learning

no code implementations • NeurIPS 2017 • André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver

Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Safe and Efficient Off-Policy Reinforcement Learning

3 code implementations • NeurIPS 2016 • Rémi Munos, Tom Stepleton, Anna Harutyunyan, Marc G. Bellemare

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning.

Atari Games reinforcement-learning +1

Paper
Code

Increasing the Action Gap: New Operators for Reinforcement Learning

2 code implementations • 15 Dec 2015 • Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos

Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator.

Ranked #1 on Atari Games on Atari 2600 Elevator Action

Atari Games Q-Learning +2

4,398

Paper
Code

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

no code implementations • 16 Jul 2015 • Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Peter Auer, András Antos

If the variance of the distributions were known, one could design an optimal sampling strategy by collecting a number of independent samples per distribution that is proportional to their variance.

Active Learning Multi-Armed Bandits

Paper
Add Code

Best-Arm Identification in Linear Bandits

no code implementations • NeurIPS 2014 • Marta Soare, Alessandro Lazaric, Rémi Munos

We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter $\theta^*$ and the objective is to return the arm with the largest reward.

Experimental Design

Paper
Add Code

Fast gradient descent for drifting least squares regression, with application to bandits

no code implementations • 11 Jul 2013 • Nathaniel Korda, Prashanth L. A., Rémi Munos

In the case when strong convexity in the regression problem is guaranteed, we provide bounds on the error both in expectation and high probability (the latter is often needed to provide theoretical guarantees for higher level algorithms), despite the drifting least squares solution.

News Recommendation regression

Paper
Add Code

Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling

no code implementations • 11 Jun 2013 • L. A. Prashanth, Nathaniel Korda, Rémi Munos

We propose a stochastic approximation (SA) based method with randomization of samples for policy evaluation using the least squares temporal difference (LSTD) algorithm.

Multi-Armed Bandits News Recommendation +1

Paper
Add Code

Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button

no code implementations • NeurIPS 2012 • Joan Fruitet, Alexandra Carpentier, Maureen Clerc, Rémi Munos

A brain-computer interface (BCI) allows users to “communicate” with a computer without using their muscles.

Brain Computer Interface General Classification

Paper
Add Code

Risk-Aversion in Multi-armed Bandits

no code implementations • NeurIPS 2012 • Amir Sani, Alessandro Lazaric, Rémi Munos

In stochastic multi--armed bandits the objective is to solve the exploration--exploitation dilemma and ultimately maximize the expected reward.

Multi-Armed Bandits

Paper
Add Code

Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions

no code implementations • NeurIPS 2012 • Alexandra Carpentier, Rémi Munos

We consider the problem of adaptive stratified sampling for Monte Carlo integration of a differentiable function given a finite number of evaluations to the function.

Paper
Add Code

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

1 code implementation • 18 May 2012 • Emilie Kaufmann, Nathaniel Korda, Rémi Munos

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933.

3D Reconstruction Thompson Sampling

Paper
Code

Speedy Q-Learning

no code implementations • NeurIPS 2011 • Mohammad Ghavamzadeh, Hilbert J. Kappen, Mohammad G. Azar, Rémi Munos

We introduce a new convergent variant of Q-learning, called speedy Q-learning, to address the problem of slow convergence in the standard form of the Q-learning algorithm.

Q-Learning

Paper
Add Code

Finite Time Analysis of Stratified Sampling for Monte Carlo

no code implementations • NeurIPS 2011 • Alexandra Carpentier, Rémi Munos

We consider the problem of stratified sampling for Monte-Carlo integration.

Paper
Add Code

Selecting the State-Representation in Reinforcement Learning

no code implementations • NeurIPS 2011 • Odalric-Ambrym Maillard, Daniil Ryabko, Rémi Munos

Without knowing neither which of the models is the correct one, nor what are the probabilistic characteristics of the resulting MDP, it is required to obtain as much reward as the optimal policy for the correct model (or for the best of the correct models, if there are several).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Sparse Recovery with Brownian Sensing

no code implementations • NeurIPS 2011 • Alexandra Carpentier, Odalric-Ambrym Maillard, Rémi Munos

We consider the problem of recovering the parameter alpha in R^K of a sparse function f, i. e. the number of non-zero entries of alpha is small compared to the number K of features, given noisy evaluations of f at a set of well-chosen sampling points.

Paper
Add Code

Error Propagation for Approximate Policy and Value Iteration

no code implementations • NeurIPS 2010 • Amir-Massoud Farahmand, Csaba Szepesvári, Rémi Munos

We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy.

Paper
Add Code

Scrambled Objects for Least-Squares Regression

no code implementations • NeurIPS 2010 • Odalric Maillard, Rémi Munos

We consider least-squares regression using a randomly generated subspace G_P\subset F of finite dimension P, where F is a function space of infinite dimension, e. g.~L_2([0, 1]^d).

regression

Paper
Add Code

LSTD with Random Projections

no code implementations • NeurIPS 2010 • Mohammad Ghavamzadeh, Alessandro Lazaric, Odalric Maillard, Rémi Munos

We provide a thorough theoretical analysis of the LSTD with random projections and derive performance bounds for the resulting algorithm.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Sensitivity analysis in HMMs with application to likelihood maximization

no code implementations • NeurIPS 2009 • Pierre-Arnaud Coquelin, Romain Deguest, Rémi Munos

We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization.

Paper
Add Code

Compressed Least-Squares Regression

no code implementations • NeurIPS 2009 • Odalric Maillard, Rémi Munos

We consider the problem of learning, from K input data, a regression function in a function space of high dimension N using projections onto a random subspace of lower dimension M. From any linear approximation algorithm using empirical risk minimization (possibly penalized), we provide bounds on the excess risk of the estimate computed in the projected subspace (compressed domain) in terms of the excess risk of the estimate built in the high-dimensional space (initial domain).

regression

Paper
Add Code

Online Optimization in X-Armed Bandits

no code implementations • NeurIPS 2008 • Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, Rémi Munos

We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space.

Paper
Add Code

Particle Filter-based Policy Gradient in POMDPs

no code implementations • NeurIPS 2008 • Pierre-Arnaud Coquelin, Romain Deguest, Rémi Munos

Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces.

Paper
Add Code

Algorithms for Infinitely Many-Armed Bandits

no code implementations • NeurIPS 2008 • Yizao Wang, Jean-Yves Audibert, Rémi Munos

We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments.

Paper
Add Code

Fitted Q-iteration in continuous action-space MDPs

no code implementations • NeurIPS 2007 • András Antos, Csaba Szepesvári, Rémi Munos

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by another policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.