Search Results for author: Pierluca D'Oro

Found 8 papers, 3 papers with code

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

no code implementations12 Mar 2024 Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin

When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios.

Continual Learning Model Compression

Do Transformer World Models Give Better Policy Gradients?

no code implementations7 Feb 2024 Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon

We integrate such AWMs into a policy gradient framework that underscores the relationship between network architectures and the policy gradient updates they inherently represent.

Navigate

The Primacy Bias in Deep Reinforcement Learning

1 code implementation16 May 2022 Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville

This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later.

Atari Games 100k reinforcement-learning +1

Policy Optimization as Online Learning with Mediator Feedback

no code implementations15 Dec 2020 Alberto Maria Metelli, Matteo Papini, Pierluca D'Oro, Marcello Restelli

In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space.

Continuous Control

How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

3 code implementations NeurIPS 2020 Pierluca D'Oro, Wojciech Jaśkowski

Deterministic-policy actor-critic algorithms for continuous control improve the actor by plugging its actions into the critic and ascending the action-value gradient, which is obtained by chaining the actor's Jacobian matrix with the gradient of the critic with respect to input actions.

Continuous Control

Gradient-Aware Model-based Policy Search

no code implementations9 Sep 2019 Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli

In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement.

Model-based Reinforcement Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.