Search Results for author: Matteo Pirotta

Found 43 papers, 7 papers with code

Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

no code implementations13 Dec 2021 Evrard Garcelon, Vashist Avadhanula, Alessandro Lazaric, Matteo Pirotta

We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, \emph{evaluations} of the true reward of each arm and it selects $K$ arms with the objective of accumulating as much reward as possible over $T$ rounds.

Privacy Amplification via Shuffling for Linear Contextual Bandits

no code implementations11 Dec 2021 Evrard Garcelon, Kamalika Chaudhuri, Vianney Perchet, Matteo Pirotta

Contextual bandit algorithms are widely used in domains where it is desirable to provide a personalized service by leveraging contextual information, that may contain sensitive information that needs to be protected.

Multi-Armed Bandits

Differentially Private Exploration in Reinforcement Learning with Linear Representation

no code implementations2 Dec 2021 Paul Luyo, Evrard Garcelon, Alessandro Lazaric, Matteo Pirotta

We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a. k. a.\ model-based setting) and provide an unified framework for analyzing joint and local differential private (DP) exploration.

reinforcement-learning

Adaptive Multi-Goal Exploration

no code implementations23 Nov 2021 Jean Tarbouriech, Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We introduce a generic strategy for provably efficient multi-goal exploration.

Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection

no code implementations NeurIPS 2021 Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure.

reinforcement-learning

A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs

no code implementations24 Jun 2021 Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

We derive a novel asymptotic problem-dependent lower-bound for regret minimization in finite-horizon tabular Markov Decision Processes (MDPs).

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

no code implementations NeurIPS 2021 Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state.

Leveraging Good Representations in Linear Contextual Bandits

no code implementations8 Apr 2021 Matteo Papini, Andrea Tirinzoni, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

We show that the regret is indeed never worse than the regret obtained by running LinUCB on the best representation (up to a $\ln M$ factor).

Multi-Armed Bandits

Encrypted Linear Contextual Bandit

no code implementations17 Mar 2021 Evrard Garcelon, Vianney Perchet, Matteo Pirotta

A critical aspect of bandit methods is that they require to observe the contexts --i. e., individual or group-level data-- and rewards in order to solve the sequential problem.

Decision Making Multi-Armed Bandits +2

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

no code implementations NeurIPS 2020 Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric

Finally, we remove forced exploration and build on confidence intervals of the optimization problem to encourage a minimum level of exploration that is better adapted to the problem structure.

Local Differential Privacy for Regret Minimization in Reinforcement Learning

no code implementations NeurIPS 2021 Evrard Garcelon, Vianney Perchet, Ciara Pike-Burke, Matteo Pirotta

Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user side.

reinforcement-learning

A Provably Efficient Sample Collection Strategy for Reinforcement Learning

no code implementations NeurIPS 2021 Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric

One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior.

reinforcement-learning

Improved Analysis of UCRL2 with Empirical Bernstein Inequality

no code implementations10 Jul 2020 Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

We consider the problem of exploration-exploitation in communicating Markov Decision Processes.

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

no code implementations9 Jul 2020 Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric.

reinforcement-learning

Kernel-Based Reinforcement Learning: A Finite-Time Analysis

1 code implementation12 Apr 2020 Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric.

reinforcement-learning

Active Model Estimation in Markov Decision Processes

no code implementations6 Mar 2020 Jean Tarbouriech, Shubhanshu Shekhar, Matteo Pirotta, Mohammad Ghavamzadeh, Alessandro Lazaric

Using a number of simple domains with heterogeneous noise in their transitions, we show that our heuristic-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime, while achieving similar asymptotic performance as that of the original algorithm.

Common Sense Reasoning Efficient Exploration

Exploration-Exploitation in Constrained MDPs

no code implementations4 Mar 2020 Yonathan Efroni, Shie Mannor, Matteo Pirotta

In many sequential decision-making problems, the goal is to optimize a utility function while satisfying a set of constraints on different utilities.

Decision Making

Conservative Exploration in Reinforcement Learning

no code implementations8 Feb 2020 Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

While learning in an unknown Markov Decision Process (MDP), an agent should trade off exploration to discover new information about the MDP, and exploitation of the current knowledge to maximize the reward.

reinforcement-learning

Improved Algorithms for Conservative Exploration in Bandits

no code implementations8 Feb 2020 Evrard Garcelon, Mohammad Ghavamzadeh, Alessandro Lazaric, Matteo Pirotta

In this case, it is desirable to deploy online learning algorithms (e. g., a multi-armed bandit algorithm) that interact with the system to learn a better/optimal policy under the constraint that during the learning process the performance is almost never worse than the performance of the baseline itself.

online learning Recommendation Systems

Concentration Inequalities for Multinoulli Random Variables

no code implementations30 Jan 2020 Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

We investigate concentration inequalities for Dirichlet and Multinomial random variables.

Exploiting Language Instructions for Interpretable and Compositional Reinforcement Learning

no code implementations13 Jan 2020 Michiel van der Meer, Matteo Pirotta, Elia Bruni

In this work, we present an alternative approach to making an agent compositional through the use of a diagnostic classifier.

Classification General Classification +1

No-Regret Exploration in Goal-Oriented Reinforcement Learning

no code implementations ICML 2020 Jean Tarbouriech, Evrard Garcelon, Michal Valko, Matteo Pirotta, Alessandro Lazaric

Many popular reinforcement learning problems (e. g., navigation in a maze, some Atari games, mountain car) are instances of the episodic setting under its stochastic shortest path (SSP) formulation, where an agent has to achieve a goal state while minimizing the cumulative cost.

Atari Games reinforcement-learning

Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs

1 code implementation NeurIPS 2019 Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

The exploration bonus is an effective approach to manage the exploration-exploitation trade-off in Markov Decision Processes (MDPs).

Regret Bounds for Learning State Representations in Reinforcement Learning

no code implementations NeurIPS 2019 Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent.

reinforcement-learning

Smoothing Policies and Safe Policy Gradients

no code implementations8 May 2019 Matteo Papini, Matteo Pirotta, Marcello Restelli

Policy gradient algorithms are among the best candidates for the much anticipated application of reinforcement learning to real-world control tasks, such as the ones arising in robotics.

Stochastic Optimization

Exploration Bonus for Regret Minimization in Undiscounted Discrete and Continuous Markov Decision Processes

no code implementations11 Dec 2018 Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

We introduce and analyse two algorithms for exploration-exploitation in discrete and continuous Markov Decision Processes (MDPs) based on exploration bonuses.

Efficient Exploration

Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes

1 code implementation NeurIPS 2018 Ronan Fruit, Matteo Pirotta, Alessandro Lazaric

While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e. g., in mountain car, the product space of speed and position contains configurations that are not physically reachable).

Efficient Exploration

Stochastic Variance-Reduced Policy Gradient

1 code implementation ICML 2018 Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli

In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs).

reinforcement-learning

Importance Weighted Transfer of Samples in Reinforcement Learning

no code implementations ICML 2018 Andrea Tirinzoni, Andrea Sessa, Matteo Pirotta, Marcello Restelli

In the proposed approach, all the samples are transferred and used by a batch RL algorithm to solve the target task, but their contribution to the learning process is proportional to their importance weight.

reinforcement-learning

Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

1 code implementation ICML 2018 Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Ronald Ortner

We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov decision process (MDP) for which an upper bound $c$ on the span of the optimal bias function is known.

Efficient Exploration reinforcement-learning

Cost-Sensitive Approach to Batch Size Adaptation for Gradient Descent

no code implementations9 Dec 2017 Matteo Pirotta, Marcello Restelli

In this paper, we propose a novel approach to automatically determine the batch size in stochastic gradient descent methods.

General Classification

Adaptive Batch Size for Safe Policy Gradients

no code implementations NeurIPS 2017 Matteo Papini, Matteo Pirotta, Marcello Restelli

Policy gradient methods are among the best Reinforcement Learning (RL) techniques to solve complex control problems.

Policy Gradient Methods reinforcement-learning

Regret Minimization in MDPs with Options without Prior Knowledge

no code implementations NeurIPS 2017 Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Emma Brunskill

The option framework integrates temporal abstraction into the reinforcement learning model through the introduction of macro-actions (i. e., options).

Compatible Reward Inverse Reinforcement Learning

no code implementations NeurIPS 2017 Alberto Maria Metelli, Matteo Pirotta, Marcello Restelli

Within this subspace, using a second-order criterion, we search for the reward function that penalizes the most a deviation from the expert's policy.

reinforcement-learning

Boosted Fitted Q-Iteration

no code implementations ICML 2017 Samuele Tosatto, Matteo Pirotta, Carlo D’Eramo, Marcello Restelli

This paper is about the study of B-FQI, an Approximated Value Iteration (AVI) algorithm that exploits a boosting procedure to estimate the action-value function in reinforcement learning problems.

Multi-objective Reinforcement Learning with Continuous Pareto Frontier Approximation Supplementary Material

no code implementations13 Jun 2014 Matteo Pirotta, Simone Parisi, Marcello Restelli

The paper is about learning a continuous approximation of the Pareto frontier in Multi-Objective Markov Decision Problems (MOMDPs).

reinforcement-learning

Adaptive Step-Size for Policy Gradient Methods

no code implementations NeurIPS 2013 Matteo Pirotta, Marcello Restelli, Luca Bascetta

In the last decade, policy gradient methods have significantly grown in popularity in the reinforcement--learning field.

Policy Gradient Methods

Cannot find the paper you are looking for? You can Submit a new open access paper.