Search Results for author: Mirco Mutti

Found 15 papers, 3 papers with code

Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate

1 code implementation9 Jul 2020 Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli

In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy?

Continuous Control

A Policy Gradient Method for Task-Agnostic Exploration

1 code implementation ICML Workshop LifelongML 2020 Mirco Mutti, Lorenzo Pratissoli, Marcello Restelli

In a reward-free environment, what is a suitable intrinsic objective for an agent to pursue so that it can learn an optimal task-agnostic exploration policy?

Continuous Control

Unsupervised Reinforcement Learning in Multiple Environments

2 code implementations16 Dec 2021 Mirco Mutti, Mattia Mancassola, Marcello Restelli

Along this line, we address the problem of unsupervised reinforcement learning in a class of multiple environments, in which the policy is pre-trained with interactions from the whole class, and then fine-tuned for several tasks in any environment of the class.

reinforcement-learning Reinforcement Learning (RL) +1

Configurable Markov Decision Processes

no code implementations ICML 2018 Alberto Maria Metelli, Mirco Mutti, Marcello Restelli

After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy.

An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies

no code implementations10 Jul 2019 Mirco Mutti, Marcello Restelli

What is a good exploration strategy for an agent that interacts with an environment in the absence of external rewards?

Model-based Reinforcement Learning

Learning to Explore a Class of Multiple Reward-Free Environments

no code implementations ICLR Workshop SSL-RL 2021 Mirco Mutti, Mattia Mancassola, Marcello Restelli

Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement learning problems specified over the same class.

reinforcement-learning Reinforcement Learning (RL)

Learning to Explore Multiple Environments without Rewards

no code implementations ICML Workshop URL 2021 Mirco Mutti, Mattia Mancassola, Marcello Restelli

Along this line, we address the problem of learning to explore a class of multiple reward-free environments with a unique general strategy, which aims to provide a universal initialization to subsequent reinforcement learning problems specified over the same class.

reinforcement-learning Reinforcement Learning (RL)

Challenging Common Assumptions in Convex Reinforcement Learning

no code implementations3 Feb 2022 Mirco Mutti, Riccardo De Santi, Piersilvio De Bartolomeis, Marcello Restelli

In particular, we show that erroneously optimizing the infinite trials objective in place of the actual finite trials one, as it is usually done, can lead to a significant approximation error.

Imitation Learning reinforcement-learning +1

The Importance of Non-Markovianity in Maximum State Entropy Exploration

no code implementations ICML Workshop URL 2021 Mirco Mutti, Riccardo De Santi, Marcello Restelli

In the maximum state entropy exploration framework, an agent interacts with a reward-free environment to learn a policy that maximizes the entropy of the expected state visitations it is inducing.

Reward-Free Policy Space Compression for Reinforcement Learning

no code implementations ICML Workshop URL 2021 Mirco Mutti, Stefano Del Col, Marcello Restelli

In this paper, we seek for a reward-free compression of the policy space into a finite set of representative policies, such that, given any policy $\pi$, the minimum R\'enyi divergence between the state-action distributions of the representative policies and the state-action distribution of $\pi$ is bounded.

reinforcement-learning Reinforcement Learning (RL)

A Tale of Sampling and Estimation in Discounted Reinforcement Learning

no code implementations11 Apr 2023 Alberto Maria Metelli, Mirco Mutti, Marcello Restelli

In this paper, we present a minimax lower bound on the discounted mean estimation problem that explicitly connects the estimation error with the mixing properties of the Markov process and the discount factor.

reinforcement-learning

Exploiting Causal Graph Priors with Posterior Sampling for Reinforcement Learning

no code implementations11 Oct 2023 Mirco Mutti, Riccardo De Santi, Marcello Restelli, Alexander Marx, Giorgia Ramponi

The prior is typically specified as a class of parametric distributions, the design of which can be cumbersome in practice, often resulting in the choice of uninformative priors.

reinforcement-learning

A Framework for Partially Observed Reward-States in RLHF

no code implementations5 Feb 2024 Chinmaya Kausik, Mirco Mutti, Aldo Pacchiano, Ambuj Tewari

We show reductions from the the two dominant forms of human feedback in RLHF - cardinal and dueling feedback to PORRL.

reinforcement-learning

Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms

no code implementations23 Feb 2024 Filippo Lazzati, Mirco Mutti, Alberto Maria Metelli

In this paper, we introduce a novel notion of feasible reward set capturing the opportunities and limitations of the offline setting and we analyze the complexity of its estimation.

Cannot find the paper you are looking for? You can Submit a new open access paper.