Search Results for author: Michael Noukhovitch

Found 10 papers, 6 papers with code

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

1 code implementation24 Mar 2024 Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall

This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work.

reinforcement-learning

Language Model Alignment with Elastic Reset

1 code implementation NeurIPS 2023 Michael Noukhovitch, Samuel Lavoie, Florian Strub, Aaron Courville

We periodically reset the online model to an exponentially moving average (EMA) of itself, then reset the EMA model to the initial model.

Chatbot Language Modelling +1

Learning Multi-Agent Communication with Contrastive Learning

no code implementations3 Jul 2023 Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch

By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory.

Contrastive Learning

Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

1 code implementation1 Apr 2022 Samuel Lavoie, Christos Tsirigotis, Max Schwarzer, Ankit Vani, Michael Noukhovitch, Kenji Kawaguchi, Aaron Courville

Simplicial Embeddings (SEM) are representations learned through self-supervised learning (SSL), wherein a representation is projected into $L$ simplices of $V$ dimensions each using a softmax operation.

Classification Inductive Bias +1

Emergent Communication under Competition

1 code implementation25 Jan 2021 Michael Noukhovitch, Travis LaCroix, Angeliki Lazaridou, Aaron Courville

First, we show that communication is proportional to cooperation, and it can occur for partially competitive scenarios using standard learning algorithms.

Misconceptions

Selfish Emergent Communication

no code implementations25 Sep 2019 Michael Noukhovitch, Travis LaCroix, Aaron Courville

Current literature in machine learning holds that unaligned, self-interested agents do not learn to use an emergent communication channel.

Systematic Generalization: What Is Required and Can It Be Learned?

2 code implementations ICLR 2019 Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville

Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated.

Systematic Generalization Visual Question Answering (VQA)

Cannot find the paper you are looking for? You can Submit a new open access paper.