Search Results for author: Marcello Restelli

Found 80 papers, 20 papers with code

Transfer from Multiple MDPs

no code implementations NeurIPS 2011 Alessandro Lazaric, Marcello Restelli

Transfer reinforcement learning (RL) methods leverage on the experience collected on a set of source tasks to speed-up RL algorithms.

reinforcement-learning Reinforcement Learning (RL) +1

Adaptive Step-Size for Policy Gradient Methods

no code implementations NeurIPS 2013 Matteo Pirotta, Marcello Restelli, Luca Bascetta

In the last decade, policy gradient methods have significantly grown in popularity in the reinforcement--learning field.

Policy Gradient Methods

Sparse Multi-Task Reinforcement Learning

no code implementations NeurIPS 2014 Daniele Calandriello, Alessandro Lazaric, Marcello Restelli

This is equivalent to assuming that the weight vectors of the task value functions are \textit{jointly sparse}, i. e., the set of their non-zero components is small and it is shared across tasks.

reinforcement-learning Reinforcement Learning (RL)

Unimodal Thompson Sampling for Graph-Structured Arms

no code implementations17 Nov 2016 Stefano Paladino, Francesco Trovò, Marcello Restelli, Nicola Gatti

We study, to the best of our knowledge, the first Bayesian algorithm for unimodal Multi-Armed Bandit (MAB) problems with graph structure.

Thompson Sampling

Boosted Fitted Q-Iteration

no code implementations ICML 2017 Samuele Tosatto, Matteo Pirotta, Carlo D’Eramo, Marcello Restelli

This paper is about the study of B-FQI, an Approximated Value Iteration (AVI) algorithm that exploits a boosting procedure to estimate the action-value function in reinforcement learning problems.

regression

Compatible Reward Inverse Reinforcement Learning

no code implementations NeurIPS 2017 Alberto Maria Metelli, Matteo Pirotta, Marcello Restelli

Within this subspace, using a second-order criterion, we search for the reward function that penalizes the most a deviation from the expert's policy.

reinforcement-learning Reinforcement Learning (RL)

Cost-Sensitive Approach to Batch Size Adaptation for Gradient Descent

no code implementations9 Dec 2017 Matteo Pirotta, Marcello Restelli

In this paper, we propose a novel approach to automatically determine the batch size in stochastic gradient descent methods.

General Classification

Importance Weighted Transfer of Samples in Reinforcement Learning

no code implementations ICML 2018 Andrea Tirinzoni, Andrea Sessa, Matteo Pirotta, Marcello Restelli

In the proposed approach, all the samples are transferred and used by a batch RL algorithm to solve the target task, but their contribution to the learning process is proportional to their importance weight.

reinforcement-learning Reinforcement Learning (RL)

Stochastic Variance-Reduced Policy Gradient

1 code implementation ICML 2018 Matteo Papini, Damiano Binaghi, Giuseppe Canonaco, Matteo Pirotta, Marcello Restelli

In this paper, we propose a novel reinforcement- learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs).

Configurable Markov Decision Processes

no code implementations ICML 2018 Alberto Maria Metelli, Mirco Mutti, Marcello Restelli

After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy.

Smoothing Policies and Safe Policy Gradients

no code implementations8 May 2019 Matteo Papini, Matteo Pirotta, Marcello Restelli

Policy Gradient (PG) algorithms are among the best candidates for the much-anticipated applications of reinforcement learning to real-world control tasks, such as robotics.

Stochastic Optimization

An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies

no code implementations10 Jul 2019 Mirco Mutti, Marcello Restelli

What is a good exploration strategy for an agent that interacts with an environment in the absence of external rewards?

Model-based Reinforcement Learning

Feature Selection via Mutual Information: New Theoretical Insights

1 code implementation17 Jul 2019 Mario Beraha, Alberto Maria Metelli, Matteo Papini, Andrea Tirinzoni, Marcello Restelli

Mutual information has been successfully adopted in filter feature-selection methods to assess both the relevancy of a subset of features in predicting the target variable and the redundancy with respect to other variables.

feature selection regression

Policy Space Identification in Configurable Environments

no code implementations9 Sep 2019 Alberto Maria Metelli, Guglielmo Manneschi, Marcello Restelli

We study the problem of identifying the policy space of a learning agent, having access to a set of demonstrations generated by its optimal policy.

Gradient-Aware Model-based Policy Search

no code implementations9 Sep 2019 Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli

In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement.

Model-based Reinforcement Learning

Risk-Averse Trust Region Optimization for Reward-Volatility Reduction

no code implementations6 Dec 2019 Lorenzo Bisi, Luca Sabbioni, Edoardo Vittori, Matteo Papini, Marcello Restelli

In real-world decision-making problems, for instance in the fields of finance, robotics or autonomous driving, keeping uncertainty under control is as important as maximizing expected returns.

Autonomous Driving Decision Making

MushroomRL: Simplifying Reinforcement Learning Research

2 code implementations4 Jan 2020 Carlo D'Eramo, Davide Tateo, Andrea Bonarini, Marcello Restelli, Jan Peters

MushroomRL is an open-source Python library developed to simplify the process of implementing and running Reinforcement Learning (RL) experiments.

reinforcement-learning Reinforcement Learning (RL)

Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning

1 code implementation ICML 2020 Alberto Maria Metelli, Flavio Mazzolini, Lorenzo Bisi, Luca Sabbioni, Marcello Restelli

The choice of the control frequency of a system has a relevant impact on the ability of reinforcement learning algorithms to learn a highly performing policy.

reinforcement-learning Reinforcement Learning (RL)

Online Joint Bid/Daily Budget Optimization of Internet Advertising Campaigns

no code implementations3 Mar 2020 Alessandro Nuara, Francesco Trovò, Nicola Gatti, Marcello Restelli

We experimentally evaluate our algorithms with synthetic settings generated from real data from Yahoo!, and we present the results of the adoption of our algorithms in a real-world application with a daily average spent of 1, 000 Euros for more than one year.

Gaussian Processes Multiple-choice

A Novel Confidence-Based Algorithm for Structured Bandits

no code implementations23 May 2020 Andrea Tirinzoni, Alessandro Lazaric, Marcello Restelli

We study finite-armed stochastic bandits where the rewards of each arm might be correlated to those of other arms.

Time-Variant Variational Transfer for Value Functions

no code implementations26 May 2020 Giuseppe Canonaco, Andrea Soprani, Manuel Roveri, Marcello Restelli

In most of the transfer learning approaches to reinforcement learning (RL) the distribution over the tasks is assumed to be stationary.

reinforcement-learning Reinforcement Learning (RL) +1

A Policy Gradient Method for Task-Agnostic Exploration

1 code implementation ICML Workshop LifelongML 2020 Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli

In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy?

Continuous Control

Sequential Transfer in Reinforcement Learning with a Generative Model

no code implementations ICML 2020 Andrea Tirinzoni, Riccardo Poiani, Marcello Restelli

We are interested in how to design reinforcement learning agents that provably reduce the sample complexity for learning new tasks by transferring knowledge from previously-solved ones.

reinforcement-learning Reinforcement Learning (RL)

Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate

1 code implementation9 Jul 2020 Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli

In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy?

Continuous Control

Newton Optimization on Helmholtz Decomposition for Continuous Games

no code implementations15 Jul 2020 Giorgia Ramponi, Marcello Restelli

In this paper, we propose NOHD (Newton Optimization on Helmholtz Decomposition), a Newton-like algorithm for multi-agent learning problems based on the decomposition of the dynamics of the system in its irrotational (Potential) and solenoidal (Hamiltonian) component.

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

no code implementations NeurIPS 2020 Andrea Tirinzoni, Matteo Pirotta, Marcello Restelli, Alessandro Lazaric

Finally, we remove forced exploration and build on confidence intervals of the optimization problem to encourage a minimum level of exploration that is better adapted to the problem structure.

Policy Optimization as Online Learning with Mediator Feedback

no code implementations15 Dec 2020 Alberto Maria Metelli, Matteo Papini, Pierluca D'Oro, Marcello Restelli

In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space.

Continuous Control

Learning to Explore a Class of Multiple Reward-Free Environments

no code implementations ICLR Workshop SSL-RL 2021 Mirco Mutti, Mattia Mancassola, Marcello Restelli

Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement learning problems specified over the same class.

reinforcement-learning Reinforcement Learning (RL)

Leveraging Good Representations in Linear Contextual Bandits

no code implementations8 Apr 2021 Matteo Papini, Andrea Tirinzoni, Marcello Restelli, Alessandro Lazaric, Matteo Pirotta

We show that the regret is indeed never worse than the regret obtained by running LinUCB on the best representation (up to a $\ln M$ factor).

Multi-Armed Bandits

Meta-Reinforcement Learning by Tracking Task Non-stationarity

1 code implementation18 May 2021 Riccardo Poiani, Andrea Tirinzoni, Marcello Restelli

At test time, TRIO tracks the evolution of the latent parameters online, hence reducing the uncertainty over future tasks and obtaining fast adaptation through the meta-learned policy.

Meta Reinforcement Learning reinforcement-learning +1

Meta Learning the Step Size in Policy Gradient Methods

no code implementations ICML Workshop AutoML 2021 Luca Sabbioni, Francesco Corda, Marcello Restelli

Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces.

Meta-Learning Meta Reinforcement Learning +1

Learning to Explore Multiple Environments without Rewards

no code implementations ICML Workshop URL 2021 Mirco Mutti, Mattia Mancassola, Marcello Restelli

Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement learning problems specified over the same class.

reinforcement-learning Reinforcement Learning (RL)

Exploiting Minimum-Variance Policy Evaluation for Policy Optimization

no code implementations29 Sep 2021 Alberto Maria Metelli, Samuele Meta, Marcello Restelli

In this setting, Importance Sampling (IS) is typically employed as a what-if analysis tool, with the goal of estimating the performance of a target policy, given samples collected with a different behavioral policy.

Goal-Directed Planning via Hindsight Experience Replay

no code implementations ICLR 2022 Lorenzo Moro, Amarildo Likmeta, Enrico Prati, Marcello Restelli

It has been extended from complex continuous domains through function approximators to bias the search of the planning tree in AlphaZero.

Lifelong Hyper-Policy Optimization with Multiple Importance Sampling Regularization

no code implementations13 Dec 2021 Pierre Liotet, Francesco Vidaich, Alberto Maria Metelli, Marcello Restelli

This hyper-policy is trained to maximize the estimated future performance, efficiently reusing past data by means of importance sampling, at the cost of introducing a controlled bias.

Management

Unsupervised Reinforcement Learning in Multiple Environments

2 code implementations16 Dec 2021 Mirco Mutti, Mattia Mancassola, Marcello Restelli

Along this line, we address the problem of unsupervised reinforcement learning in a class of multiple environments, in which the policy is pre-trained with interactions from the whole class, and then fine-tuned for several tasks in any environment of the class.

reinforcement-learning Reinforcement Learning (RL) +1

Challenging Common Assumptions in Convex Reinforcement Learning

no code implementations3 Feb 2022 Mirco Mutti, Riccardo De Santi, Piersilvio De Bartolomeis, Marcello Restelli

In particular, we show that erroneously optimizing the infinite trials objective in place of the actual finite trials one, as it is usually done, can lead to a significant approximation error.

Imitation Learning reinforcement-learning +1

The Importance of Non-Markovianity in Maximum State Entropy Exploration

no code implementations ICML Workshop URL 2021 Mirco Mutti, Riccardo De Santi, Marcello Restelli

In the maximum state entropy exploration framework, an agent interacts with a reward-free environment to learn a policy that maximizes the entropy of the expected state visitations it is inducing.

Reward-Free Policy Space Compression for Reinforcement Learning

no code implementations ICML Workshop URL 2021 Mirco Mutti, Stefano Del Col, Marcello Restelli

In this paper, we seek for a reward-free compression of the policy space into a finite set of representative policies, such that, given any policy $\pi$, the minimum R\'enyi divergence between the state-action distributions of the representative policies and the state-action distribution of $\pi$ is bounded.

reinforcement-learning Reinforcement Learning (RL)

Delayed Reinforcement Learning by Imitation

no code implementations11 May 2022 Pierre Liotet, Davide Maran, Lorenzo Bisi, Marcello Restelli

When the agent's observations or interactions are delayed, classic reinforcement learning tools usually fail.

Imitation Learning reinforcement-learning +1

ARLO: A Framework for Automated Reinforcement Learning

1 code implementation20 May 2022 Marco Mussi, Davide Lombarda, Alberto Maria Metelli, Francesco Trovò, Marcello Restelli

In this work, we propose a general and flexible framework, namely ARLO: Automated Reinforcement Learning Optimizer, to construct automated pipelines for AutoRL.

feature selection reinforcement-learning +1

Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts

no code implementations1 Jun 2022 Giulia Romano, Andrea Agostini, Francesco Trovò, Nicola Gatti, Marcello Restelli

We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW, which exploit the partial information disclosed by the reward collected over time.

Analysis, Characterization, Prediction and Attribution of Extreme Atmospheric Events with Machine Learning: a Review

no code implementations3 Jun 2022 Sancho Salcedo-Sanz, Jorge Pérez-Aracil, Guido Ascenso, Javier Del Ser, David Casillas-Pérez, Christopher Kadow, Dusan Fister, David Barriopedro, Ricardo García-Herrera, Marcello Restelli, Mateo Giuliani, Andrea Castelletti

The accurate prediction, characterization, and attribution of atmospheric EEs is therefore a key research field, in which many groups are currently working by applying different methodologies and computational tools.

Optimizing Empty Container Repositioning and Fleet Deployment via Configurable Semi-POMDPs

no code implementations25 Jul 2022 Riccardo Poiani, Ciprian Stirbu, Alberto Maria Metelli, Marcello Restelli

With the continuous growth of the global economy and markets, resource imbalance has risen to be one of the central issues in real logistic scenarios.

Dynamical Linear Bandits

1 code implementation16 Nov 2022 Marco Mussi, Alberto Maria Metelli, Marcello Restelli

Then, the hidden state evolves according to linear dynamics, affected by the performed action too.

Decision Making

Dynamic Pricing with Volume Discounts in Online Settings

no code implementations17 Nov 2022 Marco Mussi, Gianmarco Genalti, Alessandro Nuara, Francesco Trovò, Marcello Restelli, Nicola Gatti

We ran a real-world 4-month-long A/B testing experiment in collaboration with an Italian e-commerce company, in which our algorithm PVD-B-corresponding to A configuration-has been compared with human pricing specialists-corresponding to B configuration.

Simultaneously Updating All Persistence Values in Reinforcement Learning

no code implementations21 Nov 2022 Luca Sabbioni, Luca Al Daire, Lorenzo Bisi, Alberto Maria Metelli, Marcello Restelli

In reinforcement learning, the performance of learning agents is highly sensitive to the choice of time discretization.

Atari Games Q-Learning +2

Stochastic Rising Bandits

1 code implementation7 Dec 2022 Alberto Maria Metelli, Francesco Trovò, Matteo Pirola, Marcello Restelli

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i. e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a. k. a.

Model Selection Multi-Armed Bandits

Tight Performance Guarantees of Imitator Policies with Continuous Actions

no code implementations7 Dec 2022 Davide Maran, Alberto Maria Metelli, Marcello Restelli

In this paper, we study BC with the goal of providing theoretical guarantees on the performance of the imitator policy in the case of continuous actions.

Autoregressive Bandits

1 code implementation12 Dec 2022 Francesco Bacchiocchi, Gianmarco Genalti, Davide Maran, Marco Mussi, Marcello Restelli, Nicola Gatti, Alberto Maria Metelli

Autoregressive processes naturally arise in a large variety of real-world scenarios, including stock markets, sales forecasting, weather prediction, advertising, and pricing.

Decision Making

Best Arm Identification for Stochastic Rising Bandits

1 code implementation15 Feb 2023 Marco Mussi, Alessandro Montenegro, Francesco Trovó, Marcello Restelli, Alberto Maria Metelli

Then, we prove that, with a sufficiently large budget, they provide guarantees on the probability of properly identifying the optimal option at the end of the learning process.

Decision Making

Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control

no code implementations4 Mar 2023 Amarildo Likmeta, Matteo Sacco, Alberto Maria Metelli, Marcello Restelli

Uncertainty quantification has been extensively used as a means to achieve efficient directed exploration in Reinforcement Learning (RL).

Q-Learning Reinforcement Learning (RL) +1

Information-Theoretic Regret Bounds for Bandits with Fixed Expert Advice

no code implementations14 Mar 2023 Khaled Eldowa, Nicolò Cesa-Bianchi, Alberto Maria Metelli, Marcello Restelli

We investigate the problem of bandits with expert advice when the experts are fixed and known distributions over the actions.

Interpretable Linear Dimensionality Reduction based on Bias-Variance Analysis

no code implementations26 Mar 2023 Paolo Bonetti, Alberto Maria Metelli, Marcello Restelli

Instead, dimensionality reduction techniques are designed to limit the number of features in a dataset by projecting them into a lower-dimensional space, possibly considering all the original features.

Dimensionality Reduction

A Tale of Sampling and Estimation in Discounted Reinforcement Learning

no code implementations11 Apr 2023 Alberto Maria Metelli, Mirco Mutti, Marcello Restelli

In this paper, we present a minimax lower bound on the discounted mean estimation problem that explicitly connects the estimation error with the mixing properties of the Markov process and the discount factor.

reinforcement-learning

Towards Theoretical Understanding of Inverse Reinforcement Learning

no code implementations25 Apr 2023 Alberto Maria Metelli, Filippo Lazzati, Marcello Restelli

We start by formally introducing the problem of estimating the feasible reward set, the corresponding PAC requirement, and discussing the properties of particular classes of rewards.

reinforcement-learning

Truncating Trajectories in Monte Carlo Reinforcement Learning

no code implementations7 May 2023 Riccardo Poiani, Alberto Maria Metelli, Marcello Restelli

In Reinforcement Learning (RL), an agent acts in an unknown environment to maximize the expected cumulative discounted sum of an external reward signal, i. e., the expected return.

reinforcement-learning Reinforcement Learning (RL)

An Option-Dependent Analysis of Regret Minimization Algorithms in Finite-Horizon Semi-Markov Decision Processes

no code implementations10 May 2023 Gianluca Drappo, Alberto Maria Metelli, Marcello Restelli

Then, focusing on a sub-setting of HRL approaches, the options framework, we highlight how the average duration of the available options affects the planning horizon and, consequently, the regret itself.

Hierarchical Reinforcement Learning reinforcement-learning +1

Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

no code implementations13 Jun 2023 Luca Sabbioni, Francesco Corda, Marcello Restelli

Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces.

Meta Reinforcement Learning Policy Gradient Methods

Nonlinear Feature Aggregation: Two Algorithms driven by Theory

no code implementations19 Jun 2023 Paolo Bonetti, Alberto Maria Metelli, Marcello Restelli

A limitation of methods based on correlation is the assumption of linearity in the relationship between features and target.

Dimensionality Reduction feature selection +1

Pure Exploration under Mediators' Feedback

no code implementations29 Aug 2023 Riccardo Poiani, Alberto Maria Metelli, Marcello Restelli

In this setting, the agent's goal lies in sequentially choosing which mediator to query to identify with high probability the optimal arm while minimizing the identification time, i. e., the sample complexity.

Decision Making Multi-Armed Bandits

Exploiting Causal Graph Priors with Posterior Sampling for Reinforcement Learning

no code implementations11 Oct 2023 Mirco Mutti, Riccardo De Santi, Marcello Restelli, Alexander Marx, Giorgia Ramponi

The prior is typically specified as a class of parametric distributions, the design of which can be cumbersome in practice, often resulting in the choice of uninformative priors.

reinforcement-learning

Causal Feature Selection via Transfer Entropy

no code implementations17 Oct 2023 Paolo Bonetti, Alberto Maria Metelli, Marcello Restelli

We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures and leverages transfer entropy to estimate the causal flow of information from the features to the target in time series.

Causal Discovery feature selection +2

Parameterized Projected Bellman Operator

1 code implementation20 Dec 2023 Théo Vincent, Alberto Maria Metelli, Boris Belousov, Jan Peters, Marcello Restelli, Carlo D'Eramo

We formulate an optimization problem to learn PBO for generic sequential decision-making problems, and we theoretically analyze its properties in two representative classes of RL problems.

Decision Making Reinforcement Learning (RL)

Inverse Reinforcement Learning with Sub-optimal Experts

no code implementations8 Jan 2024 Riccardo Poiani, Gabriele Curti, Alberto Maria Metelli, Marcello Restelli

For this reason, in this work, we extend the IRL formulation to problems where, in addition to demonstrations from the optimal agent, we can observe the behavior of multiple sub-optimal experts.

reinforcement-learning

Sharing Knowledge in Multi-Task Deep Reinforcement Learning

1 code implementation ICLR 2020 Carlo D'Eramo, Davide Tateo, Andrea Bonarini, Marcello Restelli, Jan Peters

We study the benefit of sharing representations among tasks to enable the effective use of deep neural networks in Multi-Task Reinforcement Learning.

reinforcement-learning

Information Capacity Regret Bounds for Bandits with Mediator Feedback

no code implementations15 Feb 2024 Khaled Eldowa, Nicolò Cesa-Bianchi, Alberto Maria Metelli, Marcello Restelli

For a selection of policy set families, we prove nearly-matching lower bounds, scaling similarly with the capacity.

Cannot find the paper you are looking for? You can Submit a new open access paper.