Search Results for author: Rémi Munos

Found 58 papers, 17 papers with code

Marginalized Operators for Off-policy Reinforcement Learning

no code implementations30 Mar 2022 Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko

We show that the estimates for marginalized operators can be computed in a scalable way, which also generalizes prior results on marginalized importance sampling as special cases.

reinforcement-learning

Taylor Expansion of Discount Factors

no code implementations11 Jun 2021 Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective.

reinforcement-learning

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

no code implementations11 Jun 2021 Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play.

Revisiting Peng's Q($λ$) for Modern Reinforcement Learning

no code implementations27 Feb 2021 Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.

Continuous Control reinforcement-learning

Large-Scale Representation Learning on Graphs via Bootstrapping

3 code implementations ICLR 2022 Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L. Dyer, Rémi Munos, Petar Veličković, Michal Valko

To address these challenges, we introduce Bootstrapped Graph Latents (BGRL) - a graph representation learning method that learns by predicting alternative augmentations of the input.

Contrastive Learning Graph Representation Learning +1

Monte-Carlo Tree Search as Regularized Policy Optimization

3 code implementations ICML 2020 Jean-bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Rémi Munos

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence.

reinforcement-learning

Leverage the Average: an Analysis of KL Regularization in RL

no code implementations31 Mar 2020 Nino Vieillard, Tadashi Kozuno, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist

Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler (KL) regularization as a core component have shown outstanding performance.

reinforcement-learning

Taylor Expansion Policy Optimization

no code implementations ICML 2020 Yunhao Tang, Michal Valko, Rémi Munos

In this work, we investigate the application of Taylor expansions in reinforcement learning.

reinforcement-learning

Adaptive Trade-Offs in Off-Policy Learning

no code implementations16 Oct 2019 Mark Rowland, Will Dabney, Rémi Munos

A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms.

reinforcement-learning

Conditional Importance Sampling for Off-Policy Learning

no code implementations16 Oct 2019 Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney

We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

reinforcement-learning

Towards Consistent Performance on Atari using Expert Demonstrations

no code implementations ICLR 2019 Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.

Atari Games

Statistics and Samples in Distributional Reinforcement Learning

no code implementations21 Feb 2019 Mark Rowland, Robert Dadashi, Saurabh Kumar, Rémi Munos, Marc G. Bellemare, Will Dabney

We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution.

Distributional Reinforcement Learning reinforcement-learning

World Discovery Models

1 code implementation20 Feb 2019 Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo Avila Pires, Jean-bastien Grill, Florent Altché, Rémi Munos

As humans we are driven by a strong desire for seeking novelty in our world.

Optimistic optimization of a Brownian

no code implementations NeurIPS 2018 Jean-bastien Grill, Michal Valko, Rémi Munos

Given $W$, our goal is to return an $\epsilon$-approximation of its maximum using the smallest possible number of function evaluations, the sample complexity of the algorithm.

Neural Predictive Belief Representations

no code implementations15 Nov 2018 Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Rémi Munos

In partially observable domains it is important for the representation to encode a belief state, a sufficient statistic of the observations seen so far.

Decision Making Representation Learning

Autoregressive Quantile Networks for Generative Modeling

1 code implementation ICML 2018 Georg Ostrovski, Will Dabney, Rémi Munos

We introduce autoregressive implicit quantile networks (AIQN), a fundamentally different approach to generative modeling than those commonly used, that implicitly captures the distribution using quantile regression.

Implicit Quantile Networks for Distributional Reinforcement Learning

19 code implementations ICML 2018 Will Dabney, Georg Ostrovski, David Silver, Rémi Munos

In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN.

Atari Games Distributional Reinforcement Learning +1

Observe and Look Further: Achieving Consistent Performance on Atari

1 code implementation29 May 2018 Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games.

Montezuma's Revenge

An Analysis of Categorical Distributional Reinforcement Learning

no code implementations22 Feb 2018 Mark Rowland, Marc G. Bellemare, Will Dabney, Rémi Munos, Yee Whye Teh

Distributional approaches to value-based reinforcement learning model the entire distribution of returns, rather than just their expected values, and have recently been shown to yield state-of-the-art empirical performance.

Distributional Reinforcement Learning reinforcement-learning

Learning to Search with MCTSnets

2 code implementations ICML 2018 Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, David Silver

They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and back-up those evaluations to the root of a search tree.

Distributional Reinforcement Learning with Quantile Regression

15 code implementations27 Oct 2017 Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos

In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean.

Atari Games Distributional Reinforcement Learning +1

A Distributional Perspective on Reinforcement Learning

21 code implementations ICML 2017 Marc G. Bellemare, Will Dabney, Rémi Munos

We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning.

Atari Games reinforcement-learning

Observational Learning by Reinforcement Learning

no code implementations20 Jun 2017 Diana Borsa, Bilal Piot, Rémi Munos, Olivier Pietquin

Observational learning is a type of learning that occurs as a function of observing, retaining and possibly replicating or imitating the behaviour of another agent.

reinforcement-learning

The Cramer Distance as a Solution to Biased Wasserstein Gradients

1 code implementation ICLR 2018 Marc G. Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, Rémi Munos

We show that the Cram\'er distance possesses all three desired properties, combining the best of the Wasserstein and Kullback-Leibler divergences.

Minimax Regret Bounds for Reinforcement Learning

1 code implementation ICML 2017 Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos

We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs.

reinforcement-learning

Successor Features for Transfer in Reinforcement Learning

no code implementations NeurIPS 2017 André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver

Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks.

reinforcement-learning

Safe and Efficient Off-Policy Reinforcement Learning

3 code implementations NeurIPS 2016 Rémi Munos, Tom Stepleton, Anna Harutyunyan, Marc G. Bellemare

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning.

Atari Games reinforcement-learning

Increasing the Action Gap: New Operators for Reinforcement Learning

2 code implementations15 Dec 2015 Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos

Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator.

Atari Games Q-Learning +1

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

no code implementations16 Jul 2015 Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Peter Auer, András Antos

If the variance of the distributions were known, one could design an optimal sampling strategy by collecting a number of independent samples per distribution that is proportional to their variance.

Active Learning Multi-Armed Bandits

Best-Arm Identification in Linear Bandits

no code implementations NeurIPS 2014 Marta Soare, Alessandro Lazaric, Rémi Munos

We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter $\theta^*$ and the objective is to return the arm with the largest reward.

Experimental Design

Fast gradient descent for drifting least squares regression, with application to bandits

no code implementations11 Jul 2013 Nathaniel Korda, Prashanth L. A., Rémi Munos

In the case when strong convexity in the regression problem is guaranteed, we provide bounds on the error both in expectation and high probability (the latter is often needed to provide theoretical guarantees for higher level algorithms), despite the drifting least squares solution.

News Recommendation online learning

Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling

no code implementations11 Jun 2013 L. A. Prashanth, Nathaniel Korda, Rémi Munos

We propose a stochastic approximation (SA) based method with randomization of samples for policy evaluation using the least squares temporal difference (LSTD) algorithm.

Multi-Armed Bandits News Recommendation

Risk-Aversion in Multi-armed Bandits

no code implementations NeurIPS 2012 Amir Sani, Alessandro Lazaric, Rémi Munos

In stochastic multi--armed bandits the objective is to solve the exploration--exploitation dilemma and ultimately maximize the expected reward.

Multi-Armed Bandits

Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions

no code implementations NeurIPS 2012 Alexandra Carpentier, Rémi Munos

We consider the problem of adaptive stratified sampling for Monte Carlo integration of a differentiable function given a finite number of evaluations to the function.

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

1 code implementation18 May 2012 Emilie Kaufmann, Nathaniel Korda, Rémi Munos

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed bandit problem had been open since 1933.

3D Reconstruction

Selecting the State-Representation in Reinforcement Learning

no code implementations NeurIPS 2011 Odalric-Ambrym Maillard, Daniil Ryabko, Rémi Munos

Without knowing neither which of the models is the correct one, nor what are the probabilistic characteristics of the resulting MDP, it is required to obtain as much reward as the optimal policy for the correct model (or for the best of the correct models, if there are several).

reinforcement-learning

Speedy Q-Learning

no code implementations NeurIPS 2011 Mohammad Ghavamzadeh, Hilbert J. Kappen, Mohammad G. Azar, Rémi Munos

We introduce a new convergent variant of Q-learning, called speedy Q-learning, to address the problem of slow convergence in the standard form of the Q-learning algorithm.

Q-Learning

Sparse Recovery with Brownian Sensing

no code implementations NeurIPS 2011 Alexandra Carpentier, Odalric-Ambrym Maillard, Rémi Munos

We consider the problem of recovering the parameter alpha in R^K of a sparse function f, i. e. the number of non-zero entries of alpha is small compared to the number K of features, given noisy evaluations of f at a set of well-chosen sampling points.

Scrambled Objects for Least-Squares Regression

no code implementations NeurIPS 2010 Odalric Maillard, Rémi Munos

We consider least-squares regression using a randomly generated subspace G_P\subset F of finite dimension P, where F is a function space of infinite dimension, e. g.~L_2([0, 1]^d).

Error Propagation for Approximate Policy and Value Iteration

no code implementations NeurIPS 2010 Amir-Massoud Farahmand, Csaba Szepesvári, Rémi Munos

We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy.

LSTD with Random Projections

no code implementations NeurIPS 2010 Mohammad Ghavamzadeh, Alessandro Lazaric, Odalric Maillard, Rémi Munos

We provide a thorough theoretical analysis of the LSTD with random projections and derive performance bounds for the resulting algorithm.

reinforcement-learning

Sensitivity analysis in HMMs with application to likelihood maximization

no code implementations NeurIPS 2009 Pierre-Arnaud Coquelin, Romain Deguest, Rémi Munos

We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization.

Compressed Least-Squares Regression

no code implementations NeurIPS 2009 Odalric Maillard, Rémi Munos

We consider the problem of learning, from K input data, a regression function in a function space of high dimension N using projections onto a random subspace of lower dimension M. From any linear approximation algorithm using empirical risk minimization (possibly penalized), we provide bounds on the excess risk of the estimate computed in the projected subspace (compressed domain) in terms of the excess risk of the estimate built in the high-dimensional space (initial domain).

Online Optimization in X-Armed Bandits

no code implementations NeurIPS 2008 Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, Rémi Munos

We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space.

Algorithms for Infinitely Many-Armed Bandits

no code implementations NeurIPS 2008 Yizao Wang, Jean-Yves Audibert, Rémi Munos

We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments.

Particle Filter-based Policy Gradient in POMDPs

no code implementations NeurIPS 2008 Pierre-Arnaud Coquelin, Romain Deguest, Rémi Munos

Our setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces.

Fitted Q-iteration in continuous action-space MDPs

no code implementations NeurIPS 2007 András Antos, Csaba Szepesvári, Rémi Munos

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by another policy.

reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.