Search Results for author: Florian Strub

Found 35 papers, 21 papers with code

Learning Natural Language Generation with Truncated Reinforcement Learning

1 code implementation NAACL 2022 Alice Martin, Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, Olivier Pietquin

To our knowledge, it is the first approach that successfully learns a language generation policy without pre-training, using only reinforcement learning.

Language Modelling Question Generation +4

Language Evolution with Deep Learning

no code implementations18 Mar 2024 Mathieu Rita, Paul Michel, Rahma Chaabouni, Olivier Pietquin, Emmanuel Dupoux, Florian Strub

Computational modeling plays an essential role in the study of language emergence.

Language Model Alignment with Elastic Reset

1 code implementation NeurIPS 2023 Michael Noukhovitch, Samuel Lavoie, Florian Strub, Aaron Courville

We periodically reset the online model to an exponentially moving average (EMA) of itself, then reset the EMA model to the initial model.

Chatbot Language Modelling +1

The Edge of Orthogonality: A Simple View of What Makes BYOL Tick

no code implementations9 Feb 2023 Pierre H. Richemond, Allison Tam, Yunhao Tang, Florian Strub, Bilal Piot, Felix Hill

With simple linear algebra, we show that when using a linear predictor, the optimal predictor is close to an orthogonal projection, and propose a general framework based on orthonormalization that enables to interpret and give intuition on why BYOL works.

SemPPL: Predicting pseudo-labels for better contrastive representations

2 code implementations12 Jan 2023 Matko Bošnjak, Pierre H. Richemond, Nenad Tomasev, Florian Strub, Jacob C. Walker, Felix Hill, Lars Holger Buesing, Razvan Pascanu, Charles Blundell, Jovana Mitrovic

We propose a new semi-supervised learning method, Semantic Positives via Pseudo-Labels (SemPPL), that combines labelled and unlabelled data to learn informative representations.

Contrastive Learning Pseudo Label

Over-communicate no more: Situated RL agents learn concise communication protocols

no code implementations2 Nov 2022 Aleksandra Kalinowska, Elnaz Davoodi, Florian Strub, Kory W Mathewson, Ivana Kajic, Michael Bowling, Todd D Murphey, Patrick M Pilarski

While it is known that communication facilitates cooperation in multi-agent settings, it is unclear how to design artificial agents that can learn to effectively and efficiently communicate with each other.

Reinforcement Learning (RL)

Emergent Communication: Generalization and Overfitting in Lewis Games

1 code implementation30 Sep 2022 Mathieu Rita, Corentin Tallec, Paul Michel, Jean-bastien Grill, Olivier Pietquin, Emmanuel Dupoux, Florian Strub

Lewis signaling games are a class of simple communication games for simulating the emergence of language.

Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

no code implementations22 Sep 2022 Ian Gemp, Thomas Anthony, Yoram Bachrach, Avishkar Bhoopchand, Kalesha Bullard, Jerome Connor, Vibhavari Dasagi, Bart De Vylder, Edgar Duenez-Guzman, Romuald Elie, Richard Everett, Daniel Hennes, Edward Hughes, Mina Khan, Marc Lanctot, Kate Larson, Guy Lever, SiQi Liu, Luke Marris, Kevin R. McKee, Paul Muller, Julien Perolat, Florian Strub, Andrea Tacchetti, Eugene Tarassov, Zhe Wang, Karl Tuyls

The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks.

reinforcement-learning Reinforcement Learning (RL)

Learning Natural Language Generation from Scratch

no code implementations20 Sep 2021 Alice Martin Donati, Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, Olivier Pietquin

This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original ap-proach to train conditional language models from scratch by only using reinforcement learning (RL).

Language Modelling reinforcement-learning +2

Don't Do What Doesn't Matter: Intrinsic Motivation with Action Usefulness

1 code implementation20 May 2021 Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize.

A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning

no code implementations7 Aug 2020 Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

To do so, we cast the speaker recognition task into a sequential decision-making problem that we solve with Reinforcement Learning.

Decision Making reinforcement-learning +3

The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction

no code implementations15 Jul 2020 Alice Martin, Charles Ollion, Florian Strub, Sylvain Le Corff, Olivier Pietquin

This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture.

Countering Language Drift with Seeded Iterated Learning

no code implementations ICML 2020 Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville

At each time step, the teacher is created by copying the student agent, before being finetuned to maximize task completion.

Translation

HIGhER : Improving instruction following with Hindsight Generation for Experience Replay

no code implementations21 Oct 2019 Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin

Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality.

Instruction Following Language Acquisition

Self-Educated Language Agent with Hindsight Experience Replay for Instruction Following

no code implementations25 Sep 2019 Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin

Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality.

Instruction Following Language Acquisition

Correction of Electron Back-scattered Diffraction datasets using an evolutionary algorithm

1 code implementation7 Mar 2019 Florian Strub, Marie-Agathe Charpagne, Tresa M. Pollock

The quality of the reconstruction of the maps is critical to study the spatial distribution of phases and crystallographic orientation relationships between phases, a key interest in materials science.

Accurate reconstruction of EBSD datasets by a multimodal data approach using an evolutionary algorithm

1 code implementation7 Mar 2019 Marie-Agathe Charpagne, Florian Strub, Tresa M. Pollock

This function is then applied to un-distort the EBSD data, and the phase information is inferred using the data of the segmented speckle.

Deep Reinforcement Learning and the Deadly Triad

no code implementations6 Dec 2018 Hado van Hasselt, Yotam Doron, Florian Strub, Matteo Hessel, Nicolas Sonnerat, Joseph Modayil

In this work, we investigate the impact of the deadly triad in practice, in the context of a family of popular deep reinforcement learning models - deep Q-networks trained with experience replay - analysing how the components of this system play a role in the emergence of the deadly triad, and in the agent's performance

Learning Theory reinforcement-learning +1

Visual Reasoning with Multi-hop Feature Modulation

1 code implementation ECCV 2018 Florian Strub, Mathieu Seurin, Ethan Perez, Harm de Vries, Jérémie Mary, Philippe Preux, Aaron Courville, Olivier Pietquin

Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue.

Question Answering Visual Dialog +2

HoME: a Household Multimodal Environment

no code implementations29 Nov 2017 Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, Aaron Courville

We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context.

OpenAI Gym reinforcement-learning +1

Learning Visual Reasoning Without Strong Priors

2 code implementations10 Jul 2017 Ethan Perez, Harm de Vries, Florian Strub, Vincent Dumoulin, Aaron Courville

Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.

Visual Reasoning

End-to-end optimization of goal-driven and visually grounded dialogue systems

2 code implementations15 Mar 2017 Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin

End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning.

Dialogue Management Management +1

GuessWhat?! Visual object discovery through multi-modal dialogue

4 code implementations CVPR 2017 Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville

Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images.

Object Object Discovery

Hybrid Recommender System based on Autoencoders

4 code implementations24 Jun 2016 Florian Strub, Romaric Gaudel, Jérémie Mary

A standard model for Recommender Systems is the Matrix Completion setting: given partially known matrix of ratings given by users (rows) to items (columns), infer the unknown ratings.

Collaborative Filtering Matrix Completion +1

Cannot find the paper you are looking for? You can Submit a new open access paper.