Search Results for author: Michal Valko

Found 96 papers, 29 papers with code

Bootstrap your own latent: A new approach to self-supervised Learning

31 code implementations • 13 Jun 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.

Ranked #2 on Self-Supervised Person Re-Identification on SYSU-30k

Representation Learning Self-Supervised Image Classification +3

12,780

Paper
Code

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning

8 code implementations • NeurIPS 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Remi Munos, Michal Valko

From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.

Representation Learning Self-Supervised Learning

3,078

Paper
Code

BYOL works even without batch statistics

3 code implementations • 20 Oct 2020 • Pierre H. Richemond, Jean-bastien Grill, Florent Altché, Corentin Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu, Bilal Piot, Michal Valko

Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation.

Self-Supervised Learning

1,684

Paper
Code

A General Theoretical Paradigm to Understand Learning from Human Preferences

1 code implementation • 18 Oct 2023 • Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos

In particular we derive a new general objective called $\Psi$PO for learning from human preferences that is expressed in terms of pairwise preferences and therefore bypasses both approximations.

1,599

Paper
Code

Zonotope hit-and-run for efficient sampling from projection DPPs

1 code implementation • ICML 2017 • Guillaume Gautier, Rémi Bardenet, Michal Valko

Previous theoretical results yield a fast mixing time of our chain when targeting a distribution that is close to a projection DPP, but not a DPP in general.

Point Processes Recommendation Systems

217

Paper
Code

DPPy: Sampling DPPs with Python

2 code implementations • 19 Sep 2018 • Guillaume Gautier, Guillermo Polito, Rémi Bardenet, Michal Valko

Determinantal point processes (DPPs) are specific probability distributions over clouds of points that are used as models and computational tools across physics, probability, statistics, and more recently machine learning.

BIG-bench Machine Learning Point Processes

217

Paper
Code

Exact sampling of determinantal point processes with sublinear time preprocessing

2 code implementations • NeurIPS 2019 • Michał Dereziński, Daniele Calandriello, Michal Valko

For this purpose, we propose a new algorithm which, given access to $\mathbf{L}$, samples exactly from a determinantal point process while satisfying the following two properties: (1) its preprocessing cost is $n \cdot \text{poly}(k)$, i. e., sublinear in the size of $\mathbf{L}$, and (2) its sampling cost is $\text{poly}(k)$, i. e., independent of the size of $\mathbf{L}$.

Point Processes

217

Paper
Code

On two ways to use determinantal point processes for Monte Carlo integration

1 code implementation • NeurIPS 2019 • Guillaume Gautier, Rémi Bardenet, Michal Valko

In the absence of DPP machinery to derive an efficient sampler and analyze their estimator, the idea of Monte Carlo integration with DPPs was stored in the cellar of numerical integration.

Numerical Integration Point Processes

217

Paper
Code

Sampling from a k-DPP without looking at all items

1 code implementation • NeurIPS 2020 • Daniele Calandriello, Michal Derezinski, Michal Valko

Determinantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, recommendation, stochastic optimization, experimental design and more.

Experimental Design Point Processes +1

217

Paper
Code

Large-Scale Representation Learning on Graphs via Bootstrapping

3 code implementations • ICLR 2022 • Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L. Dyer, Rémi Munos, Petar Veličković, Michal Valko

To address these challenges, we introduce Bootstrapped Graph Latents (BGRL) - a graph representation learning method that learns by predicting alternative augmentations of the input.

Contrastive Learning Graph Representation Learning +1

Paper
Code

Broaden Your Views for Self-Supervised Video Learning

1 code implementation • ICCV 2021 • Adrià Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Ross Hemsley, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Patraucean, Florent Altché, Michal Valko, Jean-bastien Grill, Aäron van den Oord, Andrew Zisserman

Most successful self-supervised learning methods are trained to align the representations of two independent views from the data.

Ranked #1 on Self-Supervised Action Recognition on HMDB51 (finetuned)

Audio Classification Optical Flow Estimation +4

Paper
Code

Monte-Carlo Tree Search as Regularized Policy Optimization

3 code implementations • ICML 2020 • Jean-bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Rémi Munos

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction

1 code implementation • 19 Feb 2021 • Mehdi Azabou, Mohammad Gheshlaghi Azar, Ran Liu, Chi-Heng Lin, Erik C. Johnson, Kiran Bhaskaran-Nair, Max Dabagia, Bernardo Avila-Pires, Lindsey Kitchell, Keith B. Hengen, William Gray-Roncal, Michal Valko, Eva L. Dyer

State-of-the-art methods for self-supervised learning (SSL) build representations by maximizing the similarity between different transformed "views" of a sample.

Self-Supervised Learning

Paper
Code

Half-Hop: A graph upsampling approach for slowing down message passing

1 code implementation • 17 Aug 2023 • Mehdi Azabou, Venkataramana Ganesh, Shantanu Thakoor, Chi-Heng Lin, Lakshmi Sathidevi, Ran Liu, Michal Valko, Petar Veličković, Eva L. Dyer

Message passing neural networks have shown a lot of success on graph-structured data.

Ranked #1 on Node Classification on AMZ Comp

Node Classification Self-Supervised Learning

Paper
Code

Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback

1 code implementation • NeurIPS 2017 • Zheng Wen, Branislav Kveton, Michal Valko, Sharan Vaswani

Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it.

Paper
Code

Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity

1 code implementation • NeurIPS 2021 • Ran Liu, Mehdi Azabou, Max Dabagia, Chi-Heng Lin, Mohammad Gheshlaghi Azar, Keith B. Hengen, Michal Valko, Eva L. Dyer

Our approach combines a generative modeling framework with an instance-specific alignment loss that tries to maximize the representational similarity between transformed views of the input (brain state).

Paper
Code

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

1 code implementation • 13 Mar 2019 • Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Moreover, we show that our procedure selects at most $\tilde{O}(d_{eff})$ points, where $d_{eff}$ is the effective dimension of the explored space, which is typically much smaller than both $d$ and $t$.

Gaussian Processes

Paper
Code

Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification

1 code implementation • ICML 2020 • Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Gaussian processes (GP) are one of the most successful frameworks to model uncertainty.

Gaussian Processes

Paper
Code

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation

1 code implementation • NeurIPS 2021 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko

Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions.

Meta Reinforcement Learning Off-policy evaluation +1

Paper
Code

Compressing the Input for CNNs with the First-Order Scattering Transform

1 code implementation • ECCV 2018 • Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko, Michal Valko

We study the first-order scattering transform as a candidate for reducing the signal processed by a convolutional neural network (CNN).

General Classification Translation

Paper
Code

Adapting to game trees in zero-sum imperfect information games

1 code implementation • 23 Dec 2022 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

Imperfect information games (IIG) are games in which each player only partially observes the current game state.

Paper
Code

Game Plan: What AI can do for Football, and What Football can do for AI

1 code implementation • 18 Nov 2020 • Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder, Ali Eslami, Mark Rowland, Andrew Jaegle, Remi Munos, Trevor Back, Razia Ahamed, Simon Bouton, Nathalie Beauguerlange, Jackson Broshear, Thore Graepel, Demis Hassabis

The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis.

BIG-bench Machine Learning counterfactual +1

Paper
Code

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

1 code implementation • 12 Apr 2020 • Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

UCB Momentum Q-learning: Correcting the bias without forgetting

1 code implementation • 1 Mar 2021 • Pierre Menard, Omar Darwiche Domingues, Xuedong Shang, Michal Valko

We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algorithm for reinforcement learning in tabular and possibly stage-dependent, episodic Markov decision process.

Q-Learning

Paper
Code

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.

regression Reinforcement Learning (RL)

Paper
Code

Multiagent Evaluation under Incomplete Information

1 code implementation • NeurIPS 2019 • Mark Rowland, Shayegan Omidshafiei, Karl Tuyls, Julien Perolat, Michal Valko, Georgios Piliouras, Remi Munos

This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents.

Paper
Code

Planning in entropy-regularized Markov decision processes and games

1 code implementation • NeurIPS 2019 • Jean-bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser.

Paper
Code

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees

1 code implementation • 28 Sep 2022 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Fast Rates for Maximum Entropy Exploration

1 code implementation • 14 Mar 2023 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to $\widetilde{\mathcal{O}}(H^2SA/\varepsilon^2)$, yielding a statistical separation between maximum entropy exploration and reward-free exploration.

Reinforcement Learning (RL)

Paper
Code

Finding the bandit in a graph: Sequential search-and-stop

no code implementations • 6 Jun 2018 • Pierre Perrault, Vianney Perchet, Michal Valko

We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution.

Multi-Armed Bandits

Paper
Add Code

Distributed Adaptive Sampling for Kernel Matrix Approximation

no code implementations • 27 Mar 2018 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that sequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_{eff}(\gamma)$ of the dataset.

Clustering

Paper
Add Code

Second-Order Kernel Online Convex Optimization with Adaptive Sketching

no code implementations • ICML 2017 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

First-order KOCO methods such as functional gradient descent require only $\mathcal{O}(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $\mathcal{O}(\sqrt{T})$ regret.

Second-order methods

Paper
Add Code

Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting

no code implementations • 13 Sep 2016 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

We derive a new proof to show that the incremental resparsification algorithm proposed by Kelner and Levin (2013) produces a spectral sparsifier in high probability.

Paper
Add Code

Incremental Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning

no code implementations • 21 Jan 2016 • Daniele Calandriello, Alessandro Lazaric, Michal Valko, Ioannis Koutis

While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples.

Quantization

Paper
Add Code

Cheap Bandits

no code implementations • 15 Jun 2015 • Manjesh Kumar Hanawal, Venkatesh Saligrama, Michal Valko, R\' emi Munos

We consider stochastic sequential learning problems where the learner can observe the \textit{average reward of several actions}.

Paper
Add Code

Simple regret for infinitely many armed bandits

no code implementations • 18 May 2015 • Alexandra Carpentier, Michal Valko

As in the cumulative regret setting of infinitely many armed bandits, the rate of the simple regret will depend on a parameter $\beta$ characterizing the distribution of the near-optimal arms.

Paper
Add Code

Learning to Act Greedily: Polymatroid Semi-Bandits

no code implementations • 30 May 2014 • Branislav Kveton, Zheng Wen, Azin Ashkan, Michal Valko

Many important optimization problems, such as the minimum spanning tree and minimum-cost flow, can be solved optimally by a greedy method.

Paper
Add Code

Finite-Time Analysis of Kernelised Contextual Bandits

no code implementations • 26 Sep 2013 • Michal Valko, Nathaniel Korda, Remi Munos, Ilias Flaounas, Nelo Cristianini

For contextual bandits, the related algorithm GP-UCB turns out to be a special case of our algorithm, and our finite-time analysis improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function.

Multi-Armed Bandits

Paper
Add Code

A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption

no code implementations • 1 Oct 2018 • Peter L. Bartlett, Victor Gabillon, Michal Valko

The difficulty of optimization is measured in terms of 1) the amount of \emph{noise} $b$ of the function evaluation and 2) the local smoothness, $d$, of the function.

Paper
Add Code

Rotting bandits are not harder than stochastic ones

no code implementations • 27 Nov 2018 • Julien Seznec, Andrea Locatelli, Alexandra Carpentier, Alessandro Lazaric, Michal Valko

In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be stationary.

Multi-Armed Bandits Recommendation Systems

Paper
Add Code

Efficient Second-Order Online Kernel Learning with Adaptive Embedding

no code implementations • NeurIPS 2017 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

The embedded space is continuously updated to guarantee that the embedding remains accurate, and we show that the per-step cost only grows with the effective dimension of the problem and not with $T$.

Second-order methods

Paper
Add Code

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning

no code implementations • NeurIPS 2016 • Jean-bastien Grill, Michal Valko, Remi Munos

We study the sampling-based planning problem in Markov decision processes (MDPs) that we can access only through a generative model, usually referred to as Monte-Carlo planning.

Paper
Add Code

Black-box optimization of noisy functions with unknown smoothness

no code implementations • NeurIPS 2015 • Jean-bastien Grill, Michal Valko, Remi Munos

We study the problem of black-box optimization of a function $f$ of any dimension, given function evaluations perturbed by noise.

Paper
Add Code

Efficient learning by implicit exploration in bandit problems with side observations

no code implementations • NeurIPS 2014 • Tomáš Kocák, Gergely Neu, Michal Valko, Remi Munos

As the predictions of our first algorithm cannot be always computed efficiently in this setting, we propose another algorithm with similar properties and with the benefit of always being computationally efficient, at the price of a slightly more complicated tuning mechanism.

Combinatorial Optimization

Paper
Add Code

Extreme bandits

no code implementations • NeurIPS 2014 • Alexandra Carpentier, Michal Valko

In many areas of medicine, security, and life sciences, we want to allocate limited resources to different sources in order to detect extreme values.

Network Intrusion Detection

Paper
Add Code

Online combinatorial optimization with stochastic decision sets and adversarial losses

no code implementations • NeurIPS 2014 • Gergely Neu, Michal Valko

Most work on sequential learning assumes a fixed set of actions that are available all the time.

Combinatorial Optimization

Paper
Add Code

Improved large-scale graph learning through ridge spectral sparsification

no code implementations • ICML 2018 • Daniele Calandriello, Alessandro Lazaric, Ioannis Koutis, Michal Valko

By constructing a spectrally-similar graph, we are able to bound the error induced by the sparsification for a variety of downstream tasks (e. g., SSL).

Graph Learning

Paper
Add Code

Optimistic optimization of a Brownian

no code implementations • NeurIPS 2018 • Jean-bastien Grill, Michal Valko, Rémi Munos

Given $W$, our goal is to return an $\epsilon$-approximation of its maximum using the smallest possible number of function evaluations, the sample complexity of the algorithm.

Paper
Add Code

Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits

no code implementations • 11 Feb 2019 • Pierre Perrault, Vianney Perchet, Michal Valko

We improve the efficiency of algorithms for stochastic \emph{combinatorial semi-bandits}.

Paper
Add Code

Online A-Optimal Design and Active Linear Regression

no code implementations • 20 Jun 2019 • Xavier Fontaine, Pierre Perrault, Michal Valko, Vianney Perchet

By trying to minimize the $\ell^2$-loss $\mathbb{E} [\lVert\hat{\beta}-\beta^{\star}\rVert^2]$ the decision maker is actually minimizing the trace of the covariance matrix of the problem, which corresponds then to online A-optimal design.

regression

Paper
Add Code

Derivative-Free & Order-Robust Optimisation

no code implementations • 9 Oct 2019 • Victor Gabillon, Rasul Tutunov, Michal Valko, Haitham Bou Ammar

In this paper, we formalise order-robust optimisation as an instance of online learning minimising simple regret, and propose Vroom, a zero'th order optimisation algorithm capable of achieving vanishing regret in non-stationary environments, while recovering favorable rates under stochastic reward-generating processes.

Paper
Add Code

Fixed-Confidence Guarantees for Bayesian Best-Arm Identification

no code implementations • 24 Oct 2019 • Xuedong Shang, Rianne de Heide, Emilie Kaufmann, Pierre Ménard, Michal Valko

We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS).

Thompson Sampling

Paper
Add Code

No-Regret Exploration in Goal-Oriented Reinforcement Learning

no code implementations • ICML 2020 • Jean Tarbouriech, Evrard Garcelon, Michal Valko, Matteo Pirotta, Alessandro Lazaric

Many popular reinforcement learning problems (e. g., navigation in a maze, some Atari games, mountain car) are instances of the episodic setting under its stochastic shortest path (SSP) formulation, where an agent has to achieve a goal state while minimizing the cumulative cost.

Atari Games reinforcement-learning +1

Paper
Add Code

Taylor Expansion Policy Optimization

no code implementations • ICML 2020 • Yunhao Tang, Michal Valko, Rémi Munos

In this work, we investigate the application of Taylor expansions in reinforcement learning.

Off-policy evaluation reinforcement-learning

Paper
Add Code

Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards

no code implementations • 14 Apr 2020 • Aadirupa Saha, Pierre Gaillard, Michal Valko

We then study the most general version of the problem where at each round available sets are generated from some unknown arbitrary distribution (i. e., without the independence assumption) and propose an efficient algorithm with $O(\sqrt {2^K T})$ regret guarantee.

Paper
Add Code

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

no code implementations • NeurIPS 2020 • Anders Jonsson, Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support.

Paper
Add Code

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits

no code implementations • NeurIPS 2020 • Pierre Perrault, Etienne Boursier, Vianney Perchet, Michal Valko

In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family.

Thompson Sampling

Paper
Add Code

Adaptive Reward-Free Exploration

no code implementations • 11 Jun 2020 • Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko

Reward-free exploration is a reinforcement learning setting studied by Jin et al. (2020), who address it by running several algorithms with regret guarantees in parallel.

Paper
Add Code

Stochastic bandits with arm-dependent delays

no code implementations • ICML 2020 • Anne Gael Manegueu, Claire Vernade, Alexandra Carpentier, Michal Valko

Significant work has been recently dedicated to the stochastic delayed bandit setting because of its relevance in applications.

Paper
Add Code

Sampling from a $k$-DPP without looking at all items

no code implementations • 30 Jun 2020 • Daniele Calandriello, Michał Dereziński, Michal Valko

Active Learning Point Processes +1

Paper
Add Code

Gamification of Pure Exploration for Linear Bandits

no code implementations • ICML 2020 • Rémy Degenne, Pierre Ménard, Xuedong Shang, Michal Valko

We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits.

Experimental Design

Paper
Add Code

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

no code implementations • 9 Jul 2020 • Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Provably Efficient Sample Collection Strategy for Reinforcement Learning

no code implementations • NeurIPS 2021 • Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric

One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Fast active learning for pure exploration in reinforcement learning

no code implementations • 27 Jul 2020 • Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Emilie Kaufmann, Edouard Leurent, Michal Valko

Realistic environments often provide agents with very limited feedback.

Active Learning reinforcement-learning +1

Paper
Add Code

Budgeted Online Influence Maximization

no code implementations • ICML 2020 • Pierre Perrault, Zheng Wen, Michal Valko, Jennifer Healey

We introduce a new budgeted framework for online influence maximization, considering the total cost of an advertising campaign instead of the common cardinality constraint on a chosen influencer set.

valid

Paper
Add Code

Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards

no code implementations • ICML 2020 • Aadirupa Saha, Pierre Gaillard, Michal Valko

The best existing efficient (i. e., polynomial-time) algorithms for this problem only guarantee a $O(T^{2/3})$ upper-bound on the regret.

Paper
Add Code

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

no code implementations • 7 Oct 2020 • Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann, Michal Valko

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs

no code implementations • NeurIPS 2020 • Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We investigate the exploration of an unknown environment when no reward function is provided.

Paper
Add Code

On the Approximation Relationship between Optimizing Ratio of Submodular (RS) and Difference of Submodular (DS) Functions

no code implementations • 5 Jan 2021 • Pierre Perrault, Jennifer Healey, Zheng Wen, Michal Valko

We demonstrate that from an algorithm guaranteeing an approximation factor for the ratio of submodular (RS) optimization problem, we can build another algorithm having a different kind of approximation guarantee -- weaker than the classical one -- for the difference of submodular (DS) optimization problem, and vice versa.

Data Structures and Algorithms

Paper
Add Code

Geometric Entropic Exploration

no code implementations • 6 Jan 2021 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Avila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos

Exploration is essential for solving complex Reinforcement Learning (RL) tasks.

Reinforcement Learning (RL)

Paper
Add Code

Revisiting Peng's Q($λ$) for Modern Reinforcement Learning

no code implementations • 27 Feb 2021 • Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.

Continuous Control reinforcement-learning +1

Paper
Add Code

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

no code implementations • NeurIPS 2021 • Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state.

Paper
Add Code

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

no code implementations • 11 Jun 2021 • Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play.

Paper
Add Code

Taylor Expansion of Discount Factors

no code implementations • 11 Jun 2021 • Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning in two-player zero-sum partially observable Markov games with perfect recall

no code implementations • NeurIPS 2021 • Tadashi Kozuno, Pierre Ménard, Remi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an extensive game with imperfect information (EGII) through self-play.

Paper
Add Code

Adaptive Multi-Goal Exploration

no code implementations • 23 Nov 2021 • Jean Tarbouriech, Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We introduce a generic strategy for provably efficient multi-goal exploration.

Paper
Add Code

Bootstrapped Representation Learning on Graphs

no code implementations • ICLR Workshop GTRL 2021 • Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Remi Munos, Petar Veličković, Michal Valko

Current state-of-the-art self-supervised learning methods for graph neural networks are based on contrastive learning.

Contrastive Learning Representation Learning +1

Paper
Add Code

Density-Based Bonuses on Learned Representations for Reward-Free Exploration in Deep Reinforcement Learning

no code implementations • ICML Workshop URL 2021 • Omar Darwiche Domingues, Corentin Tallec, Remi Munos, Michal Valko

In this paper, we study the problem of representation learning and exploration in reinforcement learning.

Density Estimation reinforcement-learning +2

Paper
Add Code

Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times

no code implementations • 30 Jan 2022 • Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Computing a Gaussian process (GP) posterior has a computational cost cubical in the number of historical points.

Active Learning Hyperparameter Optimization

Paper
Add Code

Retrieval-Augmented Reinforcement Learning

no code implementations • 17 Feb 2022 • Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza, Peter C. Humphreys, Ksenia Konyushkova, Laurent SIfre, Michal Valko, Simon Osindero, Timothy Lillicrap, Nicolas Heess, Charles Blundell

In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Marginalized Operators for Off-policy Reinforcement Learning

no code implementations • 30 Mar 2022 • Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko

We show that the estimates for marginalized operators can be computed in a scalable way, which also generalizes prior results on marginalized importance sampling as special cases.

Off-policy evaluation reinforcement-learning

Paper
Add Code

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

no code implementations • 16 May 2022 • Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard

We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits.

Multi-Armed Bandits

Paper
Add Code

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

BYOL-Explore: Exploration by Bootstrapped Prediction

no code implementations • 16 Jun 2022 • Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot

We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments.

Paper
Add Code

Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments

no code implementations • 18 Nov 2022 • Daniel Jarrett, Corentin Tallec, Florent Altché, Thomas Mesnard, Rémi Munos, Michal Valko

In this work, we study a natural solution derived from structural causal models of the world: Our key idea is to learn representations of the future that capture precisely the unpredictable aspects of each outcome -- which we use as additional input for predictions, such that intrinsic rewards only reflect the predictable aspects of world dynamics.

Montezuma's Revenge

Paper
Add Code

Understanding Self-Predictive Learning for Reinforcement Learning

no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms

no code implementations • 6 Apr 2023 • Denis Belomestny, Pierre Menard, Alexey Naumov, Daniil Tiapkin, Michal Valko

These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Unlocking the Power of Representations in Long-term Novelty-based Exploration

no code implementations • 2 May 2023 • Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot

We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space.

Atari Games Clustering +1

Paper
Add Code

VA-learning as a more efficient alternative to Q-learning

no code implementations • 29 May 2023 • Yunhao Tang, Rémi Munos, Mark Rowland, Michal Valko

In reinforcement learning, the advantage function is critical for policy improvement, but is often extracted from a learned Q-function.

Q-Learning

Paper
Add Code

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm

no code implementations • 29 May 2023 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko

Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings.

Paper
Add Code

Local and adaptive mirror descents in extensive-form games

no code implementations • 1 Sep 2023 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

We study how to learn $\epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback.

Paper
Add Code

Demonstration-Regularized RL

no code implementations • 26 Oct 2023 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Nash Learning from Human Feedback

no code implementations • 1 Dec 2023 • Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

We term this approach Nash learning from human feedback (NLHF).

Text Summarization

Paper
Add Code

Decoding-time Realignment of Language Models

no code implementations • 5 Feb 2024 • Tianlin Liu, Shangmin Guo, Leonardo Bianco, Daniele Calandriello, Quentin Berthet, Felipe Llinares, Jessica Hoffmann, Lucas Dixon, Michal Valko, Mathieu Blondel

Aligning language models with human preferences is crucial for reducing errors and biases in these models.

Models Alignment

Paper
Add Code

Generalized Preference Optimization: A Unified Approach to Offline Alignment

no code implementations • 8 Feb 2024 • Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices.

Paper
Add Code

Human Alignment of Large Language Models through Online Preference Optimisation

no code implementations • 13 Mar 2024 • Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

Building on this equivalence, we introduce the IPO-MD algorithm that generates data with a mixture policy (between the online and reference policy) similarly as the general Nash-MD algorithm.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.