Search Results for author: Odalric-Ambrym Maillard

Found 43 papers, 5 papers with code

Restarted Bayesian Online Change-point Detector achieves Optimal Detection Delay

1 code implementation • ICML 2020 • REDA ALAMI, Odalric-Ambrym Maillard, Raphaël Féraud

In this paper, we consider the problem of sequential change-point detection where both the change-points and the distributions before and after the change are assumed to be unknown.

Change Point Detection Learning Theory

Paper
Code

CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption

no code implementations • 28 Sep 2023 • Shubhada Agrawal, Timothée Mathieu, Debabrota Basu, Odalric-Ambrym Maillard

In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions.

Paper
Add Code

Monte-Carlo tree search with uncertainty propagation via optimal transport

no code implementations • 19 Sep 2023 • Tuan Dam, Pascal Stenger, Lukas Schneider, Joni Pajarinen, Carlo D'Eramo, Odalric-Ambrym Maillard

We introduce a novel backup operator that computes value nodes as the Wasserstein barycenter of their action-value children nodes; thus, propagating the uncertainty of the estimate across the tree to the root node.

Thompson Sampling

Paper
Add Code

AdaStop: adaptive statistical testing for sound comparisons of Deep RL agents

no code implementations • 19 Jun 2023 • Timothée Mathieu, Riccardo Della Vecchia, Alena Shilova, Matheus Medeiros Centa, Hector Kohler, Odalric-Ambrym Maillard, Philippe Preux

When comparing several RL algorithms, a major question is how many executions must be made and how can we ensure that the results of such a comparison are theoretically sound.

Reinforcement Learning (RL)

Paper
Add Code

Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration and Planning

no code implementations • 5 Oct 2022 • Reda Ouhamma, Debabrota Basu, Odalric-Ambrym Maillard

Our regret bound is order-optimal with respect to $H$ and $K$.

Paper
Add Code

Risk-aware linear bandits with convex loss

no code implementations • 15 Sep 2022 • Patrick Saux, Odalric-Ambrym Maillard

In decision-making problems such as the multi-armed bandit, an agent learns sequentially by optimizing a certain feedback.

Decision Making Multi-Armed Bandits

Paper
Add Code

Collaborative Algorithms for Online Personalized Mean Estimation

1 code implementation • 24 Aug 2022 • Mahsa Asadi, Aurélien Bellet, Odalric-Ambrym Maillard, Marc Tommasi

We study the case where some of the distributions have the same mean, and the agents are allowed to actively query information from other agents.

Paper
Code

gym-DSSAT: a crop model turned into a Reinforcement Learning environment

no code implementations • 7 Jul 2022 • Romain Gautron, Emilio J. Padrón, Philippe Preux, Julien Bigot, Odalric-Ambrym Maillard, David Emukpere

gym-DSSAT is a gym interface to the Decision Support System for Agrotechnology Transfer (DSSAT), a high fidelity crop simulator.

Management reinforcement-learning +1

Paper
Add Code

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm

no code implementations • 7 Mar 2022 • Debabrota Basu, Odalric-Ambrym Maillard, Timothée Mathieu

We study the corrupted bandit problem, i. e. a stochastic multi-armed bandit problem with $k$ unknown reward distributions, which are heavy-tailed and corrupted by a history-independent adversary or Nature.

Paper
Add Code

Bregman Deviations of Generic Exponential Families

no code implementations • 18 Jan 2022 • Sayak Ray Chowdhury, Patrick Saux, Odalric-Ambrym Maillard, Aditya Gopalan

For the practitioner, we instantiate this novel bound to several classical families, e. g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain.

Paper
Add Code

Indexed Minimum Empirical Divergence for Unimodal Bandits

no code implementations • NeurIPS 2021 • Hassan Saber, Pierre Ménard, Odalric-Ambrym Maillard

We consider a multi-armed bandit problem specified by a set of one-dimensional family exponential distributions endowed with a unimodal structure.

Paper
Add Code

From Optimality to Robustness: Adaptive Re-Sampling Strategies in Stochastic Bandits

no code implementations • NeurIPS 2021 • Dorian Baudry, Patrick Saux, Odalric-Ambrym Maillard

The stochastic multi-arm bandit problem has been extensively studied under standard assumptions on the arm's distribution (e. g bounded with known support, exponential family, etc).

Decision Making

Paper
Add Code

Stochastic bandits with groups of similar arms.

1 code implementation • NeurIPS 2021 • Fabien Pesquerel, Hassan Saber, Odalric-Ambrym Maillard

For this structured problem of practical relevance, we first derive the asymptotic regret lower bound and corresponding constrained optimization problem.

Attribute

Paper
Code

From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits

no code implementations • 18 Nov 2021 • Dorian Baudry, Patrick Saux, Odalric-Ambrym Maillard

The stochastic multi-arm bandit problem has been extensively studied under standard assumptions on the arm's distribution (e. g bounded with known support, exponential family, etc).

Decision Making

Paper
Add Code

Sub-sampling for Efficient Non-Parametric Bandit Exploration

1 code implementation • NeurIPS 2020 • Dorian Baudry, Emilie Kaufmann, Odalric-Ambrym Maillard

In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions).

Thompson Sampling

Paper
Code

Learning Value Functions in Deep Policy Gradients using Residual Variance

no code implementations • ICLR 2021 • Yannis Flet-Berliac, Reda Ouhamma, Odalric-Ambrym Maillard, Philippe Preux

We prove the theoretical consistency of the new gradient estimator and observe dramatic empirical improvement across a variety of continuous control tasks and algorithms.

Continuous Control Decision Making

Paper
Add Code

Improved Exploration in Factored Average-Reward MDPs

no code implementations • 9 Sep 2020 • Mohammad Sadegh Talebi, Anders Jonsson, Odalric-Ambrym Maillard

We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP).

Paper
Add Code

Robust-Adaptive Interval Predictive Control for Linear Uncertain Systems

no code implementations • 20 Jul 2020 • Edouard Leurent, Denis Efimov, Odalric-Ambrym Maillard

We consider the problem of stabilization of a linear system, under state and control constraints, and subject to bounded disturbances and unknown parameters in the state matrix.

Model Predictive Control

Paper
Add Code

Optimal Strategies for Graph-Structured Bandits

no code implementations • 7 Jul 2020 • Hassan Saber, Pierre Ménard, Odalric-Ambrym Maillard

[0, 1]^{\mathcal{A}\times\mathcal{B}}$ and by a given weight matrix $\omega\!=\!

Paper
Add Code

Forced-exploration free Strategies for Unimodal Bandits

no code implementations • 30 Jun 2020 • Hassan Saber, Pierre Ménard, Odalric-Ambrym Maillard

This strategy is proven optimal.

Paper
Add Code

Tightening Exploration in Upper Confidence Reinforcement Learning

no code implementations • ICML 2020 • Hippolyte Bourel, Odalric-Ambrym Maillard, Mohammad Sadegh Talebi

In pursuit of practical efficiency, we present UCRL3, following the lines of UCRL2, but with two key modifications: First, it uses state-of-the-art time-uniform concentration inequalities to compute confidence sets on the reward and (component-wise) transition distributions for each state-action pair.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs

no code implementations • NeurIPS 2020 • Edouard Leurent, Denis Efimov, Odalric-Ambrym Maillard

We consider the problem of robust and adaptive model predictive control (MPC) of a linear system, with unknown parameters that are learned along the way (adaptive), in a critical setting where failures must be prevented (robust).

Autonomous Driving Model Predictive Control +1

Paper
Add Code

Regret Bounds for Learning State Representations in Reinforcement Learning

no code implementations • NeurIPS 2019 • Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Model-Based Reinforcement Learning Exploiting State-Action Equivalence

no code implementations • 9 Oct 2019 • Mahsa Asadi, Mohammad Sadegh Talebi, Hippolyte Bourel, Odalric-Ambrym Maillard

In the case of an unknown equivalence structure, we show through numerical experiments that C-UCRL combined with ApproxEquivalence outperforms UCRL2 in ergodic MDPs.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Distribution-dependent and Time-uniform Bounds for Piecewise i.i.d Bandits

no code implementations • 30 May 2019 • Subhojyoti Mukherjee, Odalric-Ambrym Maillard

The second strategy \ImpCPD makes use of the knowledge of $T$ to achieve the order optimal regret bound of $\min\big\lbrace O(\sum\limits_{i=1}^{K} \sum\limits_{g=1}^{G}\frac{\log(T/H_{1, g})}{\Delta^{opt}_{i, g}}), O(\sqrt{GT})\big\rbrace$, (where $H_{1, g}$ is the problem complexity) thereby closing an important gap with respect to the lower bound in a specific challenging setting.

Multi-Armed Bandits

Paper
Add Code

Learning Multiple Markov Chains via Adaptive Allocation

no code implementations • NeurIPS 2019 • Mohammad Sadegh Talebi, Odalric-Ambrym Maillard

We study the problem of learning the transition matrices of a set of Markov chains from a single stream of observations on each chain.

Paper
Add Code

Practical Open-Loop Optimistic Planning

no code implementations • 9 Apr 2019 • Edouard Leurent, Odalric-Ambrym Maillard

We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies - i. e. sequences of actions - and under budget constraint.

Paper
Add Code

Budgeted Reinforcement Learning in Continuous State Space

1 code implementation • NeurIPS 2019 • Nicolas Carrara, Edouard Leurent, Romain Laroche, Tanguy Urvoy, Odalric-Ambrym Maillard, Olivier Pietquin

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints.

Autonomous Driving reinforcement-learning +1

534

Paper
Code

Approximate Robust Control of Uncertain Dynamical Systems

no code implementations • 1 Mar 2019 • Edouard Leurent, Yann Blanco, Denis Efimov, Odalric-Ambrym Maillard

This work studies the design of safe control policies for large-scale non-linear systems operating in uncertain environments.

Systems and Control Robotics

Paper
Add Code

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits

no code implementations • 5 Feb 2019 • Lilian Besson, Emilie Kaufmann, Odalric-Ambrym Maillard, Julien Seznec

We introduce GLR-klUCB, a novel algorithm for the piecewise iid non-stationary bandit problem with bounded rewards.

Change Point Detection

Paper
Add Code

Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs

no code implementations • 5 Mar 2018 • Mohammad Sadegh Talebi, Odalric-Ambrym Maillard

The problem of reinforcement learning in an unknown and discrete Markov Decision Process (MDP) under the average-reward criterion is considered, when the learner interacts with the system in a single stream of observations, starting from an initial state without any reset.

LEMMA reinforcement-learning +1

Paper
Add Code

Efficient tracking of a growing number of experts

no code implementations • 31 Aug 2017 • Jaouad Mourtada, Odalric-Ambrym Maillard

By contrast, designing strategies that both achieve a near-optimal regret and maintain a reasonable number of weights is highly non-trivial.

Paper
Add Code

Streaming kernel regression with provably adaptive mean, variance, and regularization

no code implementations • 2 Aug 2017 • Audrey Durand, Odalric-Ambrym Maillard, Joelle Pineau

The variance of the noise is not assumed to be known.

regression Thompson Sampling +1

Paper
Add Code

Spectral Learning from a Single Trajectory under Finite-State Policies

no code implementations • ICML 2017 • Borja Balle, Odalric-Ambrym Maillard

We present spectral methods of moments for learning sequential models from a single trajectory, in stark contrast with the classical literature that assumes the availability of multiple i. i. d.

Paper
Add Code

Boundary Crossing Probabilities for General Exponential Families

no code implementations • 24 May 2017 • Odalric-Ambrym Maillard

We consider parametric exponential families of dimension $K$ on the real line.

Multi-Armed Bandits

Paper
Add Code

Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem

no code implementations • 7 Sep 2016 • Robin Allesiardo, Raphaël Féraud, Odalric-Ambrym Maillard

For the best-arm identification task, we introduce a version of Successive Elimination based on random shuffling of the $K$ arms.

Paper
Add Code

Low-rank Bandits with Latent Mixtures

no code implementations • 6 Sep 2016 • Aditya Gopalan, Odalric-Ambrym Maillard, Mohammadi Zaki

This induces a low-rank structure on the matrix of expected rewards r a, b from recommending item a to user b.

Recommendation Systems

Paper
Add Code

How hard is my MDP?" The distribution-norm to the rescue"

no code implementations • NeurIPS 2014 • Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor

In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$.

Reinforcement Learning (RL)

Paper
Add Code

Selecting Near-Optimal Approximate State Representations in Reinforcement Learning

no code implementations • 12 May 2014 • Ronald Ortner, Odalric-Ambrym Maillard, Daniil Ryabko

We consider a reinforcement learning setting introduced in (Maillard et al., NIPS 2011) where the learner does not have explicit access to the states of the underlying Markov decision process (MDP).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Online allocation and homogeneous partitioning for piecewise constant mean-approximation

no code implementations • NeurIPS 2012 • Alexandra Carpentier, Odalric-Ambrym Maillard

We here consider an extension of this problem to the case when the arms are the cells of a finite partition P of a continuous sampling space X \subset \Real^d.

Active Learning

Paper
Add Code

Hierarchical Optimistic Region Selection driven by Curiosity

no code implementations • NeurIPS 2012 • Odalric-Ambrym Maillard

This paper aims to take a step forwards making the term ``intrinsic motivation'' from reinforcement learning theoretically well founded, focusing on curiosity-driven learning.

Active Learning Multi-Armed Bandits

Paper
Add Code

Selecting the State-Representation in Reinforcement Learning

no code implementations • NeurIPS 2011 • Odalric-Ambrym Maillard, Daniil Ryabko, Rémi Munos

Without knowing neither which of the models is the correct one, nor what are the probabilistic characteristics of the resulting MDP, it is required to obtain as much reward as the optimal policy for the correct model (or for the best of the correct models, if there are several).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Sparse Recovery with Brownian Sensing

no code implementations • NeurIPS 2011 • Alexandra Carpentier, Odalric-Ambrym Maillard, Rémi Munos

We consider the problem of recovering the parameter alpha in R^K of a sparse function f, i. e. the number of non-zero entries of alpha is small compared to the number K of features, given noisy evaluations of f at a set of well-chosen sampling points.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.