no code implementations • 6 Apr 2025 • Mirco Mutti, Jeongyeol Kwon, Shie Mannor, Aviv Tamar
The test regret of the plan in the stochastic and contextual setting scales with $O (\lambda^{-2} C_{\lambda} (\mathbb{M}) \log^2 (MH))$, being $M$ the size of $\mathbb{M}$, $\lambda$ a separation parameter over the bandits, and $C_\lambda (\mathbb{M})$ a novel classification-coefficient that fundamentally links meta learning bandits with classification.
no code implementations • 12 Feb 2025 • Riccardo Zamboni, Mirco Mutti, Marcello Restelli
In this paper, we address this question through a generalization to multiple agents of the problem of maximizing the state distribution entropy.
no code implementations • 14 Jan 2025 • Filippo Lazzati, Mirco Mutti, Alberto Metelli
We provide an original theoretical study of Inverse Reinforcement Learning (IRL) through the lens of reward compatibility, a novel framework to quantify the compatibility of a reward with the given expert's demonstrations.
no code implementations • 18 Jul 2024 • Riccardo De Santi, Federico Arangath Joseph, Noah Liniger, Mirco Mutti, Andreas Krause
To achieve this, we bridge AE and MDP homomorphisms, which offer a way to exploit known geometric structures via abstraction.
no code implementations • 18 Jun 2024 • Riccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti
The problem of pure exploration in Markov decision processes has been cast as maximizing the entropy over the state distribution induced by the agent's policy, an objective that has been extensively studied.
no code implementations • 6 Jun 2024 • Filippo Lazzati, Mirco Mutti, Alberto Maria Metelli
We show that the structure offered by Linear MDPs is not sufficient for efficiently estimating the feasible set when the state space is large.
no code implementations • 4 Jun 2024 • Mirco Mutti, Aviv Tamar
Meta reinforcement learning sets a distribution over a set of tasks on which the agent can train at will, then is asked to learn an optimal policy for any test task efficiently.
no code implementations • 4 Jun 2024 • Riccardo Zamboni, Duilio Cirino, Marcello Restelli, Mirco Mutti
In this paper, we address the problem of entropy maximization over the *true states* with a decision policy conditioned on partial observations *only*.
no code implementations • 23 Feb 2024 • Filippo Lazzati, Mirco Mutti, Alberto Maria Metelli
In this paper, we introduce a novel notion of feasible reward set capturing the opportunities and limitations of the offline setting and we analyze the complexity of its estimation.
no code implementations • 5 Feb 2024 • Chinmaya Kausik, Mirco Mutti, Aldo Pacchiano, Ambuj Tewari
Both of these can be instrumental in speeding up learning and improving alignment.
no code implementations • 11 Oct 2023 • Mirco Mutti, Riccardo De Santi, Marcello Restelli, Alexander Marx, Giorgia Ramponi
The prior is typically specified as a class of parametric distributions, the design of which can be cumbersome in practice, often resulting in the choice of uninformative priors.
no code implementations • 11 Apr 2023 • Alberto Maria Metelli, Mirco Mutti, Marcello Restelli
In this paper, we present a minimax lower bound on the discounted mean estimation problem that explicitly connects the estimation error with the mixing properties of the Markov process and the discount factor.
no code implementations • ICML Workshop URL 2021 • Mirco Mutti, Stefano Del Col, Marcello Restelli
In this paper, we seek for a reward-free compression of the policy space into a finite set of representative policies, such that, given any policy $\pi$, the minimum R\'enyi divergence between the state-action distributions of the representative policies and the state-action distribution of $\pi$ is bounded.
no code implementations • 14 Feb 2022 • Mirco Mutti, Riccardo De Santi, Emanuele Rossi, Juan Felipe Calderon, Michael Bronstein, Marcello Restelli
In this setting, the agent can take a finite amount of reward-free interactions from a subset of these environments.
no code implementations • ICML Workshop URL 2021 • Mirco Mutti, Riccardo De Santi, Marcello Restelli
In the maximum state entropy exploration framework, an agent interacts with a reward-free environment to learn a policy that maximizes the entropy of the expected state visitations it is inducing.
no code implementations • 3 Feb 2022 • Mirco Mutti, Riccardo De Santi, Piersilvio De Bartolomeis, Marcello Restelli
In particular, we show that erroneously optimizing the infinite trials objective in place of the actual finite trials one, as it is usually done, can lead to a significant approximation error.
2 code implementations • 16 Dec 2021 • Mirco Mutti, Mattia Mancassola, Marcello Restelli
Along this line, we address the problem of unsupervised reinforcement learning in a class of multiple environments, in which the policy is pre-trained with interactions from the whole class, and then fine-tuned for several tasks in any environment of the class.
no code implementations • ICML Workshop URL 2021 • Mirco Mutti, Mattia Mancassola, Marcello Restelli
Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement learning problems specified over the same class.
no code implementations • ICLR Workshop SSL-RL 2021 • Mirco Mutti, Mattia Mancassola, Marcello Restelli
Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement learning problems specified over the same class.
1 code implementation • 9 Jul 2020 • Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli
In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy?
1 code implementation • ICML Workshop LifelongML 2020 • Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli
In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy?
no code implementations • 10 Jul 2019 • Mirco Mutti, Marcello Restelli
What is a good exploration strategy for an agent that interacts with an environment in the absence of external rewards?
no code implementations • ICML 2018 • Alberto Maria Metelli, Mirco Mutti, Marcello Restelli
After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy.