Search Results for author: Alberto Maria Metelli

Found 39 papers, 12 papers with code

Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms

no code implementations23 Feb 2024 Filippo Lazzati, Mirco Mutti, Alberto Maria Metelli

In this paper, we introduce a novel notion of feasible reward set capturing the opportunities and limitations of the offline setting and we analyze the complexity of its estimation.

Performance Improvement Bounds for Lipschitz Configurable Markov Decision Processes

no code implementations21 Feb 2024 Alberto Maria Metelli

Configurable Markov Decision Processes (Conf-MDPs) have recently been introduced as an extension of the traditional Markov Decision Processes (MDPs) to model the real-world scenarios in which there is the possibility to intervene in the environment in order to configure some of its parameters.

Information Capacity Regret Bounds for Bandits with Mediator Feedback

no code implementations15 Feb 2024 Khaled Eldowa, Nicolò Cesa-Bianchi, Alberto Maria Metelli, Marcello Restelli

For a selection of policy set families, we prove nearly-matching lower bounds, scaling similarly with the capacity.

No-Regret Reinforcement Learning in Smooth MDPs

no code implementations6 Feb 2024 Davide Maran, Alberto Maria Metelli, Matteo Papini, Marcello Restell

Obtaining no-regret guarantees for reinforcement learning (RL) in the case of problems with continuous state and/or action spaces is still one of the major open challenges in the field.

reinforcement-learning Reinforcement Learning (RL)

Inverse Reinforcement Learning with Sub-optimal Experts

no code implementations8 Jan 2024 Riccardo Poiani, Gabriele Curti, Alberto Maria Metelli, Marcello Restelli

For this reason, in this work, we extend the IRL formulation to problems where, in addition to demonstrations from the optimal agent, we can observe the behavior of multiple sub-optimal experts.

reinforcement-learning

Parameterized Projected Bellman Operator

1 code implementation20 Dec 2023 Théo Vincent, Alberto Maria Metelli, Boris Belousov, Jan Peters, Marcello Restelli, Carlo D'Eramo

We formulate an optimization problem to learn PBO for generic sequential decision-making problems, and we theoretically analyze its properties in two representative classes of RL problems.

Decision Making Reinforcement Learning (RL)

Causal Feature Selection via Transfer Entropy

no code implementations17 Oct 2023 Paolo Bonetti, Alberto Maria Metelli, Marcello Restelli

We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures and leverages transfer entropy to estimate the causal flow of information from the features to the target in time series.

Causal Discovery feature selection +2

$(ε, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits

no code implementations4 Oct 2023 Gianmarco Genalti, Lupo Marsigli, Nicola Gatti, Alberto Maria Metelli

In this setting, we study the regret minimization problem when $\epsilon$ and $u$ are unknown to the learner and it has to adapt.

Pure Exploration under Mediators' Feedback

no code implementations29 Aug 2023 Riccardo Poiani, Alberto Maria Metelli, Marcello Restelli

In this setting, the agent's goal lies in sequentially choosing which mediator to query to identify with high probability the optimal arm while minimizing the identification time, i. e., the sample complexity.

Decision Making Multi-Armed Bandits

Nonlinear Feature Aggregation: Two Algorithms driven by Theory

no code implementations19 Jun 2023 Paolo Bonetti, Alberto Maria Metelli, Marcello Restelli

A limitation of methods based on correlation is the assumption of linearity in the relationship between features and target.

Dimensionality Reduction feature selection +1

An Option-Dependent Analysis of Regret Minimization Algorithms in Finite-Horizon Semi-Markov Decision Processes

no code implementations10 May 2023 Gianluca Drappo, Alberto Maria Metelli, Marcello Restelli

Then, focusing on a sub-setting of HRL approaches, the options framework, we highlight how the average duration of the available options affects the planning horizon and, consequently, the regret itself.

Hierarchical Reinforcement Learning reinforcement-learning +1

Truncating Trajectories in Monte Carlo Reinforcement Learning

no code implementations7 May 2023 Riccardo Poiani, Alberto Maria Metelli, Marcello Restelli

In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal, i. e., the expected return.

reinforcement-learning Reinforcement Learning (RL)

Towards Theoretical Understanding of Inverse Reinforcement Learning

no code implementations25 Apr 2023 Alberto Maria Metelli, Filippo Lazzati, Marcello Restelli

We start by formally introducing the problem of estimating the feasible reward set, the corresponding PAC requirement, and discussing the properties of particular classes of rewards.

reinforcement-learning

A Tale of Sampling and Estimation in Discounted Reinforcement Learning

no code implementations11 Apr 2023 Alberto Maria Metelli, Mirco Mutti, Marcello Restelli

In this paper, we present a minimax lower bound on the discounted mean estimation problem that explicitly connects the estimation error with the mixing properties of the Markov process and the discount factor.

reinforcement-learning

Interpretable Linear Dimensionality Reduction based on Bias-Variance Analysis

no code implementations26 Mar 2023 Paolo Bonetti, Alberto Maria Metelli, Marcello Restelli

Instead, dimensionality reduction techniques are designed to limit the number of features in a dataset by projecting them into a lower-dimensional space, possibly considering all the original features.

Dimensionality Reduction

Information-Theoretic Regret Bounds for Bandits with Fixed Expert Advice

no code implementations14 Mar 2023 Khaled Eldowa, Nicolò Cesa-Bianchi, Alberto Maria Metelli, Marcello Restelli

We investigate the problem of bandits with expert advice when the experts are fixed and known distributions over the actions.

Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control

no code implementations4 Mar 2023 Amarildo Likmeta, Matteo Sacco, Alberto Maria Metelli, Marcello Restelli

Uncertainty quantification has been extensively used as a means to achieve efficient directed exploration in Reinforcement Learning (RL).

Q-Learning Reinforcement Learning (RL) +1

Best Arm Identification for Stochastic Rising Bandits

1 code implementation15 Feb 2023 Marco Mussi, Alessandro Montenegro, Francesco Trovó, Marcello Restelli, Alberto Maria Metelli

Then, we prove that, with a sufficiently large budget, they provide guarantees on the probability of properly identifying the optimal option at the end of the learning process.

Decision Making

Autoregressive Bandits

1 code implementation12 Dec 2022 Francesco Bacchiocchi, Gianmarco Genalti, Davide Maran, Marco Mussi, Marcello Restelli, Nicola Gatti, Alberto Maria Metelli

Autoregressive processes naturally arise in a large variety of real-world scenarios, including stock markets, sales forecasting, weather prediction, advertising, and pricing.

Decision Making

Tight Performance Guarantees of Imitator Policies with Continuous Actions

no code implementations7 Dec 2022 Davide Maran, Alberto Maria Metelli, Marcello Restelli

In this paper, we study BC with the goal of providing theoretical guarantees on the performance of the imitator policy in the case of continuous actions.

Stochastic Rising Bandits

1 code implementation7 Dec 2022 Alberto Maria Metelli, Francesco Trovò, Matteo Pirola, Marcello Restelli

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i. e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a. k. a.

Model Selection Multi-Armed Bandits

Simultaneously Updating All Persistence Values in Reinforcement Learning

no code implementations21 Nov 2022 Luca Sabbioni, Luca Al Daire, Lorenzo Bisi, Alberto Maria Metelli, Marcello Restelli

In reinforcement learning, the performance of learning agents is highly sensitive to the choice of time discretization.

Atari Games Q-Learning +2

Dynamical Linear Bandits

1 code implementation16 Nov 2022 Marco Mussi, Alberto Maria Metelli, Marcello Restelli

Then, the hidden state evolves according to linear dynamics, affected by the performed action too.

Decision Making

Optimizing Empty Container Repositioning and Fleet Deployment via Configurable Semi-POMDPs

no code implementations25 Jul 2022 Riccardo Poiani, Ciprian Stirbu, Alberto Maria Metelli, Marcello Restelli

With the continuous growth of the global economy and markets, resource imbalance has risen to be one of the central issues in real logistic scenarios.

ARLO: A Framework for Automated Reinforcement Learning

1 code implementation20 May 2022 Marco Mussi, Davide Lombarda, Alberto Maria Metelli, Francesco Trovò, Marcello Restelli

In this work, we propose a general and flexible framework, namely ARLO: Automated Reinforcement Learning Optimizer, to construct automated pipelines for AutoRL.

feature selection reinforcement-learning +1

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization

no code implementations13 Dec 2021 Pierre Liotet, Francesco Vidaich, Alberto Maria Metelli, Marcello Restelli

This hyper-policy is trained to maximize the estimated future performance, efficiently reusing past data by means of importance sampling, at the cost of introducing a controlled bias.

Management

Exploiting Minimum-Variance Policy Evaluation for Policy Optimization

no code implementations29 Sep 2021 Alberto Maria Metelli, Samuele Meta, Marcello Restelli

In this setting, Importance Sampling (IS) is typically employed as a what-if analysis tool, with the goal of estimating the performance of a target policy, given samples collected with a different behavioral policy.

Policy Optimization as Online Learning with Mediator Feedback

no code implementations15 Dec 2020 Alberto Maria Metelli, Matteo Papini, Pierluca D'Oro, Marcello Restelli

In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space.

Continuous Control

Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning

1 code implementation ICML 2020 Alberto Maria Metelli, Flavio Mazzolini, Lorenzo Bisi, Luca Sabbioni, Marcello Restelli

The choice of the control frequency of a system has a relevant impact on the ability of reinforcement learning algorithms to learn a highly performing policy.

reinforcement-learning Reinforcement Learning (RL)

Policy Space Identification in Configurable Environments

no code implementations9 Sep 2019 Alberto Maria Metelli, Guglielmo Manneschi, Marcello Restelli

We study the problem of identifying the policy space of a learning agent, having access to a set of demonstrations generated by its optimal policy.

Gradient-Aware Model-based Policy Search

no code implementations9 Sep 2019 Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli

In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement.

Model-based Reinforcement Learning

Feature Selection via Mutual Information: New Theoretical Insights

1 code implementation17 Jul 2019 Mario Beraha, Alberto Maria Metelli, Matteo Papini, Andrea Tirinzoni, Marcello Restelli

Mutual information has been successfully adopted in filter feature-selection methods to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables.

feature selection regression

Configurable Markov Decision Processes

no code implementations ICML 2018 Alberto Maria Metelli, Mirco Mutti, Marcello Restelli

After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy.

Compatible Reward Inverse Reinforcement Learning

no code implementations NeurIPS 2017 Alberto Maria Metelli, Matteo Pirotta, Marcello Restelli

Within this subspace, using a second-order criterion, we search for the reward function that penalizes the most a deviation from the expert's policy.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.