no code implementations • 23 Feb 2024 • Filippo Lazzati, Mirco Mutti, Alberto Maria Metelli
In this paper, we introduce a novel notion of feasible reward set capturing the opportunities and limitations of the offline setting and we analyze the complexity of its estimation.
no code implementations • 5 Feb 2024 • Chinmaya Kausik, Mirco Mutti, Aldo Pacchiano, Ambuj Tewari
We show reductions from the the two dominant forms of human feedback in RLHF - cardinal and dueling feedback to PORRL.
no code implementations • 11 Oct 2023 • Mirco Mutti, Riccardo De Santi, Marcello Restelli, Alexander Marx, Giorgia Ramponi
The prior is typically specified as a class of parametric distributions, the design of which can be cumbersome in practice, often resulting in the choice of uninformative priors.
no code implementations • 11 Apr 2023 • Alberto Maria Metelli, Mirco Mutti, Marcello Restelli
In this paper, we present a minimax lower bound on the discounted mean estimation problem that explicitly connects the estimation error with the mixing properties of the Markov process and the discount factor.
no code implementations • ICML Workshop URL 2021 • Mirco Mutti, Stefano Del Col, Marcello Restelli
In this paper, we seek for a reward-free compression of the policy space into a finite set of representative policies, such that, given any policy $\pi$, the minimum R\'enyi divergence between the state-action distributions of the representative policies and the state-action distribution of $\pi$ is bounded.
no code implementations • 14 Feb 2022 • Mirco Mutti, Riccardo De Santi, Emanuele Rossi, Juan Felipe Calderon, Michael Bronstein, Marcello Restelli
In this setting, the agent can take a finite amount of reward-free interactions from a subset of these environments.
no code implementations • ICML Workshop URL 2021 • Mirco Mutti, Riccardo De Santi, Marcello Restelli
In the maximum state entropy exploration framework, an agent interacts with a reward-free environment to learn a policy that maximizes the entropy of the expected state visitations it is inducing.
no code implementations • 3 Feb 2022 • Mirco Mutti, Riccardo De Santi, Piersilvio De Bartolomeis, Marcello Restelli
In particular, we show that erroneously optimizing the infinite trials objective in place of the actual finite trials one, as it is usually done, can lead to a significant approximation error.
2 code implementations • 16 Dec 2021 • Mirco Mutti, Mattia Mancassola, Marcello Restelli
Along this line, we address the problem of unsupervised reinforcement learning in a class of multiple environments, in which the policy is pre-trained with interactions from the whole class, and then fine-tuned for several tasks in any environment of the class.
no code implementations • ICML Workshop URL 2021 • Mirco Mutti, Mattia Mancassola, Marcello Restelli
Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement learning problems specified over the same class.
no code implementations • ICLR Workshop SSL-RL 2021 • Mirco Mutti, Mattia Mancassola, Marcello Restelli
Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement learning problems specified over the same class.
1 code implementation • 9 Jul 2020 • Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli
In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy?
1 code implementation • ICML Workshop LifelongML 2020 • Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli
In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy?
no code implementations • 10 Jul 2019 • Mirco Mutti, Marcello Restelli
What is a good exploration strategy for an agent that interacts with an environment in the absence of external rewards?
no code implementations • ICML 2018 • Alberto Maria Metelli, Mirco Mutti, Marcello Restelli
After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy.