Search Results for author: Mirco Mutti

Found 23 papers, 3 papers with code

A Classification View on Meta Learning Bandits

no code implementations6 Apr 2025 Mirco Mutti, Jeongyeol Kwon, Shie Mannor, Aviv Tamar

The test regret of the plan in the stochastic and contextual setting scales with $O (\lambda^{-2} C_{\lambda} (\mathbb{M}) \log^2 (MH))$, being $M$ the size of $\mathbb{M}$, $\lambda$ a separation parameter over the bandits, and $C_\lambda (\mathbb{M})$ a novel classification-coefficient that fundamentally links meta learning bandits with classification.

Classification Meta-Learning +2

Towards Principled Multi-Agent Task Agnostic Exploration

no code implementations12 Feb 2025 Riccardo Zamboni, Mirco Mutti, Marcello Restelli

In this paper, we address this question through a generalization to multiple agents of the problem of maximizing the state distribution entropy.

Reward Compatibility: A Framework for Inverse RL

no code implementations14 Jan 2025 Filippo Lazzati, Mirco Mutti, Alberto Metelli

We provide an original theoretical study of Inverse Reinforcement Learning (IRL) through the lens of reward compatibility, a novel framework to quantify the compatibility of a reward with the given expert's demonstrations.

The Limits of Pure Exploration in POMDPs: When the Observation Entropy is Enough

no code implementations18 Jun 2024 Riccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti

The problem of pure exploration in Markov decision processes has been cast as maximizing the entropy over the state distribution induced by the agent's policy, an objective that has been extensively studied.

How does Inverse RL Scale to Large State Spaces? A Provably Efficient Approach

no code implementations6 Jun 2024 Filippo Lazzati, Mirco Mutti, Alberto Maria Metelli

We show that the structure offered by Linear MDPs is not sufficient for efficiently estimating the feasible set when the state space is large.

Test-Time Regret Minimization in Meta Reinforcement Learning

no code implementations4 Jun 2024 Mirco Mutti, Aviv Tamar

Meta reinforcement learning sets a distribution over a set of tasks on which the agent can train at will, then is asked to learn an optimal policy for any test task efficiently.

Meta Reinforcement Learning reinforcement-learning +1

How to Explore with Belief: State Entropy Maximization in POMDPs

no code implementations4 Jun 2024 Riccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti

In this paper, we address the problem of entropy maximization over the *true states* with a decision policy conditioned on partial observations *only*.

Hallucination

Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms

no code implementations23 Feb 2024 Filippo Lazzati, Mirco Mutti, Alberto Maria Metelli

In this paper, we introduce a novel notion of feasible reward set capturing the opportunities and limitations of the offline setting and we analyze the complexity of its estimation.

Exploiting Causal Graph Priors with Posterior Sampling for Reinforcement Learning

no code implementations11 Oct 2023 Mirco Mutti, Riccardo De Santi, Marcello Restelli, Alexander Marx, Giorgia Ramponi

The prior is typically specified as a class of parametric distributions, the design of which can be cumbersome in practice, often resulting in the choice of uninformative priors.

reinforcement-learning Reinforcement Learning

A Tale of Sampling and Estimation in Discounted Reinforcement Learning

no code implementations11 Apr 2023 Alberto Maria Metelli, Mirco Mutti, Marcello Restelli

In this paper, we present a minimax lower bound on the discounted mean estimation problem that explicitly connects the estimation error with the mixing properties of the Markov process and the discount factor.

reinforcement-learning Reinforcement Learning

Reward-Free Policy Space Compression for Reinforcement Learning

no code implementations ICML Workshop URL 2021 Mirco Mutti, Stefano Del Col, Marcello Restelli

In this paper, we seek for a reward-free compression of the policy space into a finite set of representative policies, such that, given any policy $\pi$, the minimum R\'enyi divergence between the state-action distributions of the representative policies and the state-action distribution of $\pi$ is bounded.

reinforcement-learning Reinforcement Learning +1

The Importance of Non-Markovianity in Maximum State Entropy Exploration

no code implementations ICML Workshop URL 2021 Mirco Mutti, Riccardo De Santi, Marcello Restelli

In the maximum state entropy exploration framework, an agent interacts with a reward-free environment to learn a policy that maximizes the entropy of the expected state visitations it is inducing.

Challenging Common Assumptions in Convex Reinforcement Learning

no code implementations3 Feb 2022 Mirco Mutti, Riccardo De Santi, Piersilvio De Bartolomeis, Marcello Restelli

In particular, we show that erroneously optimizing the infinite trials objective in place of the actual finite trials one, as it is usually done, can lead to a significant approximation error.

Imitation Learning reinforcement-learning +2

Unsupervised Reinforcement Learning in Multiple Environments

2 code implementations16 Dec 2021 Mirco Mutti, Mattia Mancassola, Marcello Restelli

Along this line, we address the problem of unsupervised reinforcement learning in a class of multiple environments, in which the policy is pre-trained with interactions from the whole class, and then fine-tuned for several tasks in any environment of the class.

reinforcement-learning Reinforcement Learning +2

Learning to Explore Multiple Environments without Rewards

no code implementations ICML Workshop URL 2021 Mirco Mutti, Mattia Mancassola, Marcello Restelli

Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement learning problems specified over the same class.

reinforcement-learning Reinforcement Learning +1

Learning to Explore a Class of Multiple Reward-Free Environments

no code implementations ICLR Workshop SSL-RL 2021 Mirco Mutti, Mattia Mancassola, Marcello Restelli

Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement learning problems specified over the same class.

reinforcement-learning Reinforcement Learning +1

Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate

1 code implementation9 Jul 2020 Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli

In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy?

continuous-control Continuous Control

A Policy Gradient Method for Task-Agnostic Exploration

1 code implementation ICML Workshop LifelongML 2020 Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli

In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy?

continuous-control Continuous Control

Configurable Markov Decision Processes

no code implementations ICML 2018 Alberto Maria Metelli, Mirco Mutti, Marcello Restelli

After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy.

Cannot find the paper you are looking for? You can Submit a new open access paper.