1 code implementation • 23 Oct 2024 • Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux, Arian Hosseini, Rishabh Agarwal, Aaron Courville
However, asynchronous training relies on an underexplored regime, online but off-policy RLHF: learning on samples from previous iterations of our model.
1 code implementation • 24 Mar 2024 • Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall
This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work.
1 code implementation • NeurIPS 2023 • Michael Noukhovitch, Samuel Lavoie, Florian Strub, Aaron Courville
We periodically reset the online model to an exponentially moving average (EMA) of itself, then reset the EMA model to the initial model.
no code implementations • 3 Jul 2023 • Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch
By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory.
1 code implementation • 1 Apr 2022 • Samuel Lavoie, Christos Tsirigotis, Max Schwarzer, Ankit Vani, Michael Noukhovitch, Kenji Kawaguchi, Aaron Courville
Simplicial Embeddings (SEM) are representations learned through self-supervised learning (SSL), wherein a representation is projected into $L$ simplices of $V$ dimensions each using a softmax operation.
1 code implementation • NeurIPS 2021 • Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Charlin, Devon Hjelm, Philip Bachman, Aaron Courville
Data efficiency is a key challenge for deep reinforcement learning.
Ranked #3 on
Atari Games 100k
on Atari 100k
(using extra training data)
no code implementations • ICLR Workshop SSL-RL 2021 • Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Charlin, R Devon Hjelm, Philip Bachman, Aaron Courville
Data efficiency poses a major challenge for deep reinforcement learning.
1 code implementation • 25 Jan 2021 • Michael Noukhovitch, Travis LaCroix, Angeliki Lazaridou, Aaron Courville
First, we show that communication is proportional to cooperation, and it can occur for partially competitive scenarios using standard learning algorithms.
no code implementations • 25 Sep 2019 • Michael Noukhovitch, Travis LaCroix, Aaron Courville
Current literature in machine learning holds that unaligned, self-interested agents do not learn to use an emergent communication channel.
2 code implementations • ICLR 2019 • Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville
Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated.
no code implementations • WS 2018 • Stanisław Jastrzębski, Dzmitry Bahdanau, Seyedarian Hosseini, Michael Noukhovitch, Yoshua Bengio, Jackie Chi Kit Cheung
Commonsense knowledge bases such as ConceptNet represent knowledge in the form of relational triples.