Search Results for author: Matteo Papini

Found 17 papers, 3 papers with code

Optimisic Information Directed Sampling

no code implementations23 Feb 2024 Gergely Neu, Matteo Papini, Ludovic Schwartz

We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class.

Multi-Armed Bandits

No-Regret Reinforcement Learning in Smooth MDPs

no code implementations6 Feb 2024 Davide Maran, Alberto Maria Metelli, Matteo Papini, Marcello Restell

Obtaining no-regret guarantees for reinforcement learning (RL) in the case of problems with continuous state and/or action spaces is still one of the major open challenges in the field.

reinforcement-learning Reinforcement Learning (RL)

Importance-Weighted Offline Learning Done Right

no code implementations27 Sep 2023 Germano Gabbianelli, Gergely Neu, Matteo Papini

These improvements are made possible by the observation that the upper and lower tails importance-weighted estimators behave very differently from each other, and their careful control can massively improve on previous results that were all based on symmetric two-sided concentration inequalities.

Offline Primal-Dual Reinforcement Learning for Linear MDPs

no code implementations22 May 2023 Germano Gabbianelli, Gergely Neu, Nneka Okolo, Matteo Papini

Offline Reinforcement Learning (RL) aims to learn a near-optimal policy from a fixed dataset of transitions collected by another policy.

Offline RL reinforcement-learning +2

Online Learning with Off-Policy Feedback

no code implementations18 Jul 2022 Germano Gabbianelli, Matteo Papini, Gergely Neu

We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback.

Decision Making Multi-Armed Bandits

Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

no code implementations27 May 2022 Gergely Neu, Julia Olkhovskaya, Matteo Papini, Ludovic Schwartz

We study the Bayesian regret of the renowned Thompson Sampling algorithm in contextual bandits with binary losses and adversarially-selected contexts.

Multi-Armed Bandits Thompson Sampling

Leveraging Good Representations in Linear Contextual Bandits

no code implementations8 Apr 2021 Matteo Papini, Andrea Tirinzoni, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

We show that the regret is indeed never worse than the regret obtained by running LinUCB on the best representation (up to a $\ln M$ factor).

Multi-Armed Bandits

Policy Optimization as Online Learning with Mediator Feedback

no code implementations15 Dec 2020 Alberto Maria Metelli, Matteo Papini, Pierluca D'Oro, Marcello Restelli

In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space.

Continuous Control

Risk-Averse Trust Region Optimization for Reward-Volatility Reduction

no code implementations6 Dec 2019 Lorenzo Bisi, Luca Sabbioni, Edoardo Vittori, Matteo Papini, Marcello Restelli

In real-world decision-making problems, for instance in the fields of finance, robotics or autonomous driving, keeping uncertainty under control is as important as maximizing expected returns.

Autonomous Driving Decision Making

Gradient-Aware Model-based Policy Search

no code implementations9 Sep 2019 Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli

In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement.

Model-based Reinforcement Learning

Feature Selection via Mutual Information: New Theoretical Insights

1 code implementation17 Jul 2019 Mario Beraha, Alberto Maria Metelli, Matteo Papini, Andrea Tirinzoni, Marcello Restelli

Mutual information has been successfully adopted in filter feature-selection methods to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables.

feature selection regression

Smoothing Policies and Safe Policy Gradients

no code implementations8 May 2019 Matteo Papini, Matteo Pirotta, Marcello Restelli

Policy Gradient (PG) algorithms are among the best candidates for the much-anticipated applications of reinforcement learning to real-world control tasks, such as robotics.

Stochastic Optimization

Stochastic Variance-Reduced Policy Gradient

1 code implementation ICML 2018 Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli

In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs).

Cannot find the paper you are looking for? You can Submit a new open access paper.