Search Results for author: Odalric-Ambrym Maillard

Found 43 papers, 5 papers with code

Restarted Bayesian Online Change-point Detector achieves Optimal Detection Delay

1 code implementation ICML 2020 REDA ALAMI, Odalric-Ambrym Maillard, Raphaël Féraud

In this paper, we consider the problem of sequential change-point detection where both the change-points and the distributions before and after the change are assumed to be unknown.

Change Point Detection Learning Theory

CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption

no code implementations28 Sep 2023 Shubhada Agrawal, Timothée Mathieu, Debabrota Basu, Odalric-Ambrym Maillard

In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions.

Monte-Carlo tree search with uncertainty propagation via optimal transport

no code implementations19 Sep 2023 Tuan Dam, Pascal Stenger, Lukas Schneider, Joni Pajarinen, Carlo D'Eramo, Odalric-Ambrym Maillard

We introduce a novel backup operator that computes value nodes as the Wasserstein barycenter of their action-value children nodes; thus, propagating the uncertainty of the estimate across the tree to the root node.

Thompson Sampling

AdaStop: adaptive statistical testing for sound comparisons of Deep RL agents

no code implementations19 Jun 2023 Timothée Mathieu, Riccardo Della Vecchia, Alena Shilova, Matheus Medeiros Centa, Hector Kohler, Odalric-Ambrym Maillard, Philippe Preux

When comparing several RL algorithms, a major question is how many executions must be made and how can we ensure that the results of such a comparison are theoretically sound.

Reinforcement Learning (RL)

Risk-aware linear bandits with convex loss

no code implementations15 Sep 2022 Patrick Saux, Odalric-Ambrym Maillard

In decision-making problems such as the multi-armed bandit, an agent learns sequentially by optimizing a certain feedback.

Decision Making Multi-Armed Bandits

Collaborative Algorithms for Online Personalized Mean Estimation

1 code implementation24 Aug 2022 Mahsa Asadi, Aurélien Bellet, Odalric-Ambrym Maillard, Marc Tommasi

We study the case where some of the distributions have the same mean, and the agents are allowed to actively query information from other agents.

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

no code implementations7 Mar 2022 Debabrota Basu, Odalric-Ambrym Maillard, Timothée Mathieu

We study the corrupted bandit problem, i. e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent adversary or Nature.

Bregman Deviations of Generic Exponential Families

no code implementations18 Jan 2022 Sayak Ray Chowdhury, Patrick Saux, Odalric-Ambrym Maillard, Aditya Gopalan

For the practitioner, we instantiate this novel bound to several classical families, e. g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain.

Indexed Minimum Empirical Divergence for Unimodal Bandits

no code implementations NeurIPS 2021 Hassan Saber, Pierre Ménard, Odalric-Ambrym Maillard

We consider a multi-armed bandit problem specified by a set of one-dimensional family exponential distributions endowed with a unimodal structure.

From Optimality to Robustness: Adaptive Re-Sampling Strategies in Stochastic Bandits

no code implementations NeurIPS 2021 Dorian Baudry, Patrick Saux, Odalric-Ambrym Maillard

The stochastic multi-arm bandit problem has been extensively studied under standard assumptions on the arm's distribution (e. g bounded with known support, exponential family, etc).

Decision Making

Stochastic bandits with groups of similar arms.

1 code implementation NeurIPS 2021 Fabien Pesquerel, Hassan Saber, Odalric-Ambrym Maillard

For this structured problem of practical relevance, we first derive the asymptotic regret lower bound and corresponding constrained optimization problem.

Attribute

From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits

no code implementations18 Nov 2021 Dorian Baudry, Patrick Saux, Odalric-Ambrym Maillard

The stochastic multi-arm bandit problem has been extensively studied under standard assumptions on the arm's distribution (e. g bounded with known support, exponential family, etc).

Decision Making

Sub-sampling for Efficient Non-Parametric Bandit Exploration

1 code implementation NeurIPS 2020 Dorian Baudry, Emilie Kaufmann, Odalric-Ambrym Maillard

In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions).

Thompson Sampling

Learning Value Functions in Deep Policy Gradients using Residual Variance

no code implementations ICLR 2021 Yannis Flet-Berliac, Reda Ouhamma, Odalric-Ambrym Maillard, Philippe Preux

We prove the theoretical consistency of the new gradient estimator and observe dramatic empirical improvement across a variety of continuous control tasks and algorithms.

Continuous Control Decision Making

Improved Exploration in Factored Average-Reward MDPs

no code implementations9 Sep 2020 Mohammad Sadegh Talebi, Anders Jonsson, Odalric-Ambrym Maillard

We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP).

Robust-Adaptive Interval Predictive Control for Linear Uncertain Systems

no code implementations20 Jul 2020 Edouard Leurent, Denis Efimov, Odalric-Ambrym Maillard

We consider the problem of stabilization of a linear system, under state and control constraints, and subject to bounded disturbances and unknown parameters in the state matrix.

Model Predictive Control

Optimal Strategies for Graph-Structured Bandits

no code implementations7 Jul 2020 Hassan Saber, Pierre Ménard, Odalric-Ambrym Maillard

[0, 1]^{\mathcal{A}\times\mathcal{B}}$ and by a given weight matrix $\omega\!=\!

Tightening Exploration in Upper Confidence Reinforcement Learning

no code implementations ICML 2020 Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

In pursuit of practical efficiency, we present UCRL3, following the lines of UCRL2, but with two key modifications: First, it uses state-of-the-art time-uniform concentration inequalities to compute confidence sets on the reward and (component-wise) transition distributions for each state-action pair.

reinforcement-learning Reinforcement Learning (RL)

Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs

no code implementations NeurIPS 2020 Edouard Leurent, Denis Efimov, Odalric-Ambrym Maillard

We consider the problem of robust and adaptive model predictive control (MPC) of a linear system, with unknown parameters that are learned along the way (adaptive), in a critical setting where failures must be prevented (robust).

Autonomous Driving Model Predictive Control +1

Regret Bounds for Learning State Representations in Reinforcement Learning

no code implementations NeurIPS 2019 Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent.

reinforcement-learning Reinforcement Learning (RL)

Model-Based Reinforcement Learning Exploiting State-Action Equivalence

no code implementations9 Oct 2019 Mahsa Asadi, Mohammad Sadegh Talebi, Hippolyte Bourel, Odalric-Ambrym Maillard

In the case of an unknown equivalence structure, we show through numerical experiments that C-UCRL combined with ApproxEquivalence outperforms UCRL2 in ergodic MDPs.

Model-based Reinforcement Learning reinforcement-learning +1

Distribution-dependent and Time-uniform Bounds for Piecewise i.i.d Bandits

no code implementations30 May 2019 Subhojyoti Mukherjee, Odalric-Ambrym Maillard

The second strategy \ImpCPD makes use of the knowledge of $T$ to achieve the order optimal regret bound of $\min\big\lbrace O(\sum\limits_{i=1}^{K} \sum\limits_{g=1}^{G}\frac{\log(T/H_{1, g})}{\Delta^{opt}_{i, g}}), O(\sqrt{GT})\big\rbrace$, (where $H_{1, g}$ is the problem complexity) thereby closing an important gap with respect to the lower bound in a specific challenging setting.

Multi-Armed Bandits

Learning Multiple Markov Chains via Adaptive Allocation

no code implementations NeurIPS 2019 Mohammad Sadegh Talebi, Odalric-Ambrym Maillard

We study the problem of learning the transition matrices of a set of Markov chains from a single stream of observations on each chain.

Practical Open-Loop Optimistic Planning

no code implementations9 Apr 2019 Edouard Leurent, Odalric-Ambrym Maillard

We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies - i. e. sequences of actions - and under budget constraint.

Approximate Robust Control of Uncertain Dynamical Systems

no code implementations1 Mar 2019 Edouard Leurent, Yann Blanco, Denis Efimov, Odalric-Ambrym Maillard

This work studies the design of safe control policies for large-scale non-linear systems operating in uncertain environments.

Systems and Control Robotics

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

no code implementations5 Feb 2019 Lilian Besson, Emilie Kaufmann, Odalric-Ambrym Maillard, Julien Seznec

We introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards.

Change Point Detection

Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs

no code implementations5 Mar 2018 Mohammad Sadegh Talebi, Odalric-Ambrym Maillard

The problem of reinforcement learning in an unknown and discrete Markov Decision Process (MDP) under the average-reward criterion is considered, when the learner interacts with the system in a single stream of observations, starting from an initial state without any reset.

LEMMA reinforcement-learning +1

Efficient tracking of a growing number of experts

no code implementations31 Aug 2017 Jaouad Mourtada, Odalric-Ambrym Maillard

By contrast, designing strategies that both achieve a near-optimal regret and maintain a reasonable number of weights is highly non-trivial.

Spectral Learning from a Single Trajectory under Finite-State Policies

no code implementations ICML 2017 Borja Balle, Odalric-Ambrym Maillard

We present spectral methods of moments for learning sequential models from a single trajectory, in stark contrast with the classical literature that assumes the availability of multiple i. i. d.

Boundary Crossing Probabilities for General Exponential Families

no code implementations24 May 2017 Odalric-Ambrym Maillard

We consider parametric exponential families of dimension $K$ on the real line.

Multi-Armed Bandits

Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem

no code implementations7 Sep 2016 Robin Allesiardo, Raphaël Féraud, Odalric-Ambrym Maillard

For the best-arm identification task, we introduce a version of Successive Elimination based on random shuffling of the $K$ arms.

Low-rank Bandits with Latent Mixtures

no code implementations6 Sep 2016 Aditya Gopalan, Odalric-Ambrym Maillard, Mohammadi Zaki

This induces a low-rank structure on the matrix of expected rewards r a, b from recommending item a to user b.

Recommendation Systems

How hard is my MDP?" The distribution-norm to the rescue"

no code implementations NeurIPS 2014 Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor

In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$.

Reinforcement Learning (RL)

Selecting Near-Optimal Approximate State Representations in Reinforcement Learning

no code implementations12 May 2014 Ronald Ortner, Odalric-Ambrym Maillard, Daniil Ryabko

We consider a reinforcement learning setting introduced in (Maillard et al., NIPS 2011) where the learner does not have explicit access to the states of the underlying Markov decision process (MDP).

reinforcement-learning Reinforcement Learning (RL)

Online allocation and homogeneous partitioning for piecewise constant mean-approximation

no code implementations NeurIPS 2012 Alexandra Carpentier, Odalric-Ambrym Maillard

We here consider an extension of this problem to the case when the arms are the cells of a finite partition P of a continuous sampling space X \subset \Real^d.

Active Learning

Hierarchical Optimistic Region Selection driven by Curiosity

no code implementations NeurIPS 2012 Odalric-Ambrym Maillard

This paper aims to take a step forwards making the term ``intrinsic motivation'' from reinforcement learning theoretically well founded, focusing on curiosity-driven learning.

Active Learning Multi-Armed Bandits

Selecting the State-Representation in Reinforcement Learning

no code implementations NeurIPS 2011 Odalric-Ambrym Maillard, Daniil Ryabko, Rémi Munos

Without knowing neither which of the models is the correct one, nor what are the probabilistic characteristics of the resulting MDP, it is required to obtain as much reward as the optimal policy for the correct model (or for the best of the correct models, if there are several).

reinforcement-learning Reinforcement Learning (RL)

Sparse Recovery with Brownian Sensing

no code implementations NeurIPS 2011 Alexandra Carpentier, Odalric-Ambrym Maillard, Rémi Munos

We consider the problem of recovering the parameter alpha in R^K of a sparse function f, i. e. the number of non-zero entries of alpha is small compared to the number K of features, given noisy evaluations of f at a set of well-chosen sampling points.

Cannot find the paper you are looking for? You can Submit a new open access paper.