no code implementations • 12 Mar 2024 • Simon Dufort-Labbé, Pierluca D'Oro, Evgenii Nikishin, Razvan Pascanu, Pierre-Luc Bacon, Aristide Baratin
When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios.
no code implementations • 7 Feb 2024 • Michel Ma, Tianwei Ni, Clement Gehring, Pierluca D'Oro, Pierre-Luc Bacon
We integrate such AWMs into a policy gradient framework that underscores the relationship between network architectures and the policy gradient updates they inherently represent.
1 code implementation • 29 Sep 2023 • Martin Klissarov, Pierluca D'Oro, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff
Exploring rich environments and evaluating one's actions without prior knowledge is immensely challenging.
1 code implementation • 16 May 2022 • Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville
This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later.
Ranked #8 on Atari Games 100k on Atari 100k
no code implementations • 15 Dec 2020 • Alberto Maria Metelli, Matteo Papini, Pierluca D'Oro, Marcello Restelli
In this paper, we introduce the notion of mediator feedback that frames PO as an online learning problem over the policy space.
3 code implementations • NeurIPS 2020 • Pierluca D'Oro, Wojciech Jaśkowski
Deterministic-policy actor-critic algorithms for continuous control improve the actor by plugging its actions into the critic and ascending the action-value gradient, which is obtained by chaining the actor's Jacobian matrix with the gradient of the critic with respect to input actions.
no code implementations • 7 Apr 2020 • Giorgio Giannone, Asha Anoosheh, Alessio Quaglino, Pierluca D'Oro, Marco Gallieri, Jonathan Masci
INODE is trained like a standard RNN, it learns to discriminate short event sequences and to perform event-by-event online inference.
no code implementations • 9 Sep 2019 • Pierluca D'Oro, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, Marcello Restelli
In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement.