no code implementations • 23 Feb 2024 • Gergely Neu, Matteo Papini, Ludovic Schwartz
We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class.
no code implementations • 6 Feb 2024 • Davide Maran, Alberto Maria Metelli, Matteo Papini, Marcello Restell
Obtaining no-regret guarantees for reinforcement learning (RL) in the case of problems with continuous state and/or action spaces is still one of the major open challenges in the field.
no code implementations • 27 Sep 2023 • Germano Gabbianelli, Gergely Neu, Matteo Papini
These improvements are made possible by the observation that the upper and lower tails importance-weighted estimators behave very differently from each other, and their careful control can massively improve on previous results that were all based on symmetric two-sided concentration inequalities.
no code implementations • 22 May 2023 • Germano Gabbianelli, Gergely Neu, Nneka Okolo, Matteo Papini
Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed dataset of transitions collected by another policy.
no code implementations • 24 Oct 2022 • Andrea Tirinzoni, Matteo Papini, Ahmed Touati, Alessandro Lazaric, Matteo Pirotta
We study the problem of representation learning in stochastic contextual linear bandits.
no code implementations • 18 Jul 2022 • Germano Gabbianelli, Matteo Papini, Gergely Neu
We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback.
no code implementations • 27 May 2022 • Gergely Neu, Julia Olkhovskaya, Matteo Papini, Ludovic Schwartz
We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts.
no code implementations • NeurIPS 2021 • Matteo Papini, Andrea Tirinzoni, Aldo Pacchiano, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta
We study the role of the representation of state-action value functions in regret minimization in finite-horizon Markov Decision Processes (MDPs) with linear structure.
no code implementations • 8 Apr 2021 • Matteo Papini, Andrea Tirinzoni, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta
We show that the regret is indeed never worse than the regret obtained by running LinUCB on the best representation (up to a $\ln M$ factor).
no code implementations • 15 Dec 2020 • Alberto Maria Metelli, Matteo Papini, Pierluca D'Oro, Marcello Restelli
In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space.
no code implementations • 6 Dec 2019 • Lorenzo Bisi, Luca Sabbioni, Edoardo Vittori, Matteo Papini, Marcello Restelli
In real-world decision-making problems, for instance in the fields of finance, robotics or autonomous driving, keeping uncertainty under control is as important as maximizing expected returns.
no code implementations • 9 Sep 2019 • Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli
In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement.
1 code implementation • 17 Jul 2019 • Mario Beraha, Alberto Maria Metelli, Matteo Papini, Andrea Tirinzoni, Marcello Restelli
Mutual information has been successfully adopted in filter feature-selection methods to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables.
no code implementations • 8 May 2019 • Matteo Papini, Matteo Pirotta, Marcello Restelli
Policy Gradient (PG) algorithms are among the best candidates for the much-anticipated applications of reinforcement learning to real-world control tasks, such as robotics.
2 code implementations • NeurIPS 2018 • Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli
Policy optimization is an effective reinforcement learning approach to solve continuous control tasks.
1 code implementation • ICML 2018 • Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli
In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs).
no code implementations • NeurIPS 2017 • Matteo Papini, Matteo Pirotta, Marcello Restelli
Policy gradient methods are among the best Reinforcement Learning (RL) techniques to solve complex control problems.