1 code implementation • NAACL 2022 • Alice Martin, Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, Olivier Pietquin
To our knowledge, it is the first approach that successfully learns a language generation policy without pre-training, using only reinforcement learning.
no code implementations • 5 Dec 2024 • John Dang, Shivalika Singh, Daniel D'souza, Arash Ahmadian, Alejandro Salamanca, Madeline Smith, Aidan Peppin, Sungjin Hong, Manoj Govindassamy, Terrence Zhao, Sandra Kublik, Meor Amer, Viraat Aryabumi, Jon Ander Campos, Yi-Chern Tan, Tom Kocmi, Florian Strub, Nathan Grinsztajn, Yannis Flet-Berliac, Acyr Locatelli, Hangyu Lin, Dwarak Talupuru, Bharat Venkitesh, David Cairuz, Bowen Yang, Tim Chung, Wei-Yin Ko, Sylvie Shang Shi, Amir Shukayev, Sammie Bae, Aleksandra Piktus, Roman Castagné, Felipe Cruz-Salinas, Eddie Kim, Lucas Crawhall-Stein, Adrien Morisot, Sudip Roy, Phil Blunsom, Ivan Zhang, Aidan Gomez, Nick Frosst, Marzieh Fadaee, Beyza Ermis, Ahmet Üstün, Sara Hooker
We introduce the Aya Expanse model family, a new generation of 8B and 32B parameter multilingual language models, aiming to address the critical challenge of developing highly performant multilingual models that match or surpass the capabilities of monolingual models.
no code implementations • 27 Jun 2024 • Nathan Grinsztajn, Yannis Flet-Berliac, Mohammad Gheshlaghi Azar, Florian Strub, Bill Wu, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Olivier Pietquin, Matthieu Geist
To reconcile these approaches, we introduce a principled approach for making direct alignment length-invariant.
no code implementations • 27 Jun 2024 • Yannis Flet-Berliac, Nathan Grinsztajn, Florian Strub, Bill Wu, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Mohammad Gheshlaghi Azar, Olivier Pietquin, Matthieu Geist
Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment.
1 code implementation • 30 Apr 2024 • Mathieu Rita, Florian Strub, Rahma Chaabouni, Paul Michel, Emmanuel Dupoux, Olivier Pietquin
While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO).
no code implementations • 18 Mar 2024 • Mathieu Rita, Paul Michel, Rahma Chaabouni, Olivier Pietquin, Emmanuel Dupoux, Florian Strub
Computational modeling plays an essential role in the study of language emergence.
1 code implementation • NeurIPS 2023 • Michael Noukhovitch, Samuel Lavoie, Florian Strub, Aaron Courville
We periodically reset the online model to an exponentially moving average (EMA) of itself, then reset the EMA model to the initial model.
no code implementations • 9 Feb 2023 • Pierre H. Richemond, Allison Tam, Yunhao Tang, Florian Strub, Bilal Piot, Felix Hill
With simple linear algebra, we show that when using a linear predictor, the optimal predictor is close to an orthogonal projection, and propose a general framework based on orthonormalization that enables to interpret and give intuition on why BYOL works.
2 code implementations • 12 Jan 2023 • Matko Bošnjak, Pierre H. Richemond, Nenad Tomasev, Florian Strub, Jacob C. Walker, Felix Hill, Lars Holger Buesing, Razvan Pascanu, Charles Blundell, Jovana Mitrovic
We propose a new semi-supervised learning method, Semantic Positives via Pseudo-Labels (SemPPL), that combines labelled and unlabelled data to learn informative representations.
no code implementations • 2 Nov 2022 • Aleksandra Kalinowska, Elnaz Davoodi, Florian Strub, Kory W Mathewson, Ivana Kajic, Michael Bowling, Todd D Murphey, Patrick M Pilarski
While it is known that communication facilitates cooperation in multi-agent settings, it is unclear how to design artificial agents that can learn to effectively and efficiently communicate with each other.
1 code implementation • 30 Sep 2022 • Mathieu Rita, Corentin Tallec, Paul Michel, Jean-bastien Grill, Olivier Pietquin, Emmanuel Dupoux, Florian Strub
Lewis signaling games are a class of simple communication games for simulating the emergence of language.
no code implementations • 22 Sep 2022 • Ian Gemp, Thomas Anthony, Yoram Bachrach, Avishkar Bhoopchand, Kalesha Bullard, Jerome Connor, Vibhavari Dasagi, Bart De Vylder, Edgar Duenez-Guzman, Romuald Elie, Richard Everett, Daniel Hennes, Edward Hughes, Mina Khan, Marc Lanctot, Kate Larson, Guy Lever, SiQi Liu, Luke Marris, Kevin R. McKee, Paul Muller, Julien Perolat, Florian Strub, Andrea Tacchetti, Eugene Tarassov, Zhe Wang, Karl Tuyls
The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks.
1 code implementation • 30 Jun 2022 • Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen Mcaleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent SIfre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls
It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes).
1 code implementation • ICLR 2022 • Rahma Chaabouni, Florian Strub, Florent Altché, Eugene Tarassov, Corentin Tallec, Elnaz Davoodi, Kory Wallace Mathewson, Olivier Tieleman, Angeliki Lazaridou, Bilal Piot
Emergent communication aims for a better understanding of human language evolution and building more efficient representations.
no code implementations • 20 Sep 2021 • Alice Martin Donati, Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, Olivier Pietquin
This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original ap-proach to train conditional language models from scratch by only using reinforcement learning (RL).
1 code implementation • 20 May 2021 • Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin
Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize.
1 code implementation • ICCV 2021 • Adrià Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Ross Hemsley, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Patraucean, Florent Altché, Michal Valko, Jean-bastien Grill, Aäron van den Oord, Andrew Zisserman
Most successful self-supervised learning methods are trained to align the representations of two independent views from the data.
no code implementations • ICLR Workshop SSL-RL 2021 • Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin
We evaluate RAM on the procedurally-generated environment MiniGrid, against state-of-the-art methods.
8 code implementations • NeurIPS 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Remi Munos, Michal Valko
From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.
3 code implementations • 20 Oct 2020 • Pierre H. Richemond, Jean-bastien Grill, Florent Altché, Corentin Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu, Bilal Piot, Michal Valko
Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation.
no code implementations • EMNLP 2020 • Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville
Language drift has been one of the major obstacles to train language models through interaction.
no code implementations • 7 Aug 2020 • Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin
To do so, we cast the speaker recognition task into a sequential decision-making problem that we solve with Reinforcement Learning.
no code implementations • 15 Jul 2020 • Alice Martin, Charles Ollion, Florian Strub, Sylvain Le Corff, Olivier Pietquin
This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture.
30 code implementations • 13 Jun 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko
From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.
Ranked #2 on Self-Supervised Person Re-Identification on SYSU-30k
no code implementations • ICML 2020 • Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville
At each time step, the teacher is created by copying the student agent, before being finetuned to maximize task completion.
no code implementations • 21 Oct 2019 • Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin
Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality.
no code implementations • 25 Sep 2019 • Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin
Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality.
1 code implementation • 7 Mar 2019 • Florian Strub, Marie-Agathe Charpagne, Tresa M. Pollock
The quality of the reconstruction of the maps is critical to study the spatial distribution of phases and crystallographic orientation relationships between phases, a key interest in materials science.
1 code implementation • 7 Mar 2019 • Marie-Agathe Charpagne, Florian Strub, Tresa M. Pollock
This function is then applied to un-distort the EBSD data, and the phase information is inferred using the data of the segmented speckle.
no code implementations • 6 Dec 2018 • Hado van Hasselt, Yotam Doron, Florian Strub, Matteo Hessel, Nicolas Sonnerat, Joseph Modayil
In this work, we investigate the impact of the deadly triad in practice, in the context of a family of popular deep reinforcement learning models - deep Q-networks trained with experience replay - analysing how the components of this system play a role in the emergence of the deadly triad, and in the agent's performance
1 code implementation • ECCV 2018 • Florian Strub, Mathieu Seurin, Ethan Perez, Harm de Vries, Jérémie Mary, Philippe Preux, Aaron Courville, Olivier Pietquin
Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue.
no code implementations • 29 Nov 2017 • Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, Aaron Courville
We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context.
6 code implementations • 22 Sep 2017 • Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville
We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation.
Ranked #3 on Visual Question Answering (VQA) on CLEVR-Humans
Image Retrieval with Multi-Modal Query Visual Question Answering (VQA) +3
2 code implementations • 10 Jul 2017 • Ethan Perez, Harm de Vries, Florian Strub, Vincent Dumoulin, Aaron Courville
Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.
3 code implementations • NeurIPS 2017 • Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, Aaron Courville
It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected.
2 code implementations • 15 Mar 2017 • Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin
End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning.
4 code implementations • CVPR 2017 • Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville
Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images.
4 code implementations • 24 Jun 2016 • Florian Strub, Romaric Gaudel, Jérémie Mary
A standard model for Recommender Systems is the Matrix Completion setting: given partially known matrix of ratings given by users (rows) to items (columns), infer the unknown ratings.
Ranked #1 on Recommendation Systems on Douban
1 code implementation • 2 Mar 2016 • Florian Strub, Jeremie Mary, Romaric Gaudel
Such algorithms look for latent variables in a large sparse matrix of ratings.