Search Results for author: Michael Noukhovitch

Found 11 papers, 7 papers with code

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

1 code implementation23 Oct 2024 Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux, Arian Hosseini, Rishabh Agarwal, Aaron Courville

However, asynchronous training relies on an underexplored regime, online but off-policy RLHF: learning on samples from previous iterations of our model.

Instruction Following Language Modelling +1

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

1 code implementation24 Mar 2024 Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall

This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work.

reinforcement-learning

Language Model Alignment with Elastic Reset

1 code implementation NeurIPS 2023 Michael Noukhovitch, Samuel Lavoie, Florian Strub, Aaron Courville

We periodically reset the online model to an exponentially moving average (EMA) of itself, then reset the EMA model to the initial model.

Chatbot Language Modeling +3

Learning Multi-Agent Communication with Contrastive Learning

no code implementations3 Jul 2023 Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch

By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory.

Contrastive Learning

Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

1 code implementation1 Apr 2022 Samuel Lavoie, Christos Tsirigotis, Max Schwarzer, Ankit Vani, Michael Noukhovitch, Kenji Kawaguchi, Aaron Courville

Simplicial Embeddings (SEM) are representations learned through self-supervised learning (SSL), wherein a representation is projected into $L$ simplices of $V$ dimensions each using a softmax operation.

Classification Inductive Bias +1

Emergent Communication under Competition

1 code implementation25 Jan 2021 Michael Noukhovitch, Travis LaCroix, Angeliki Lazaridou, Aaron Courville

First, we show that communication is proportional to cooperation, and it can occur for partially competitive scenarios using standard learning algorithms.

Misconceptions

Selfish Emergent Communication

no code implementations25 Sep 2019 Michael Noukhovitch, Travis LaCroix, Aaron Courville

Current literature in machine learning holds that unaligned, self-interested agents do not learn to use an emergent communication channel.

Systematic Generalization: What Is Required and Can It Be Learned?

2 code implementations ICLR 2019 Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville

Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated.

Systematic Generalization Visual Question Answering (VQA)

Cannot find the paper you are looking for? You can Submit a new open access paper.