no code implementations • 8 Feb 2024 • Tu Anh Nguyen, Benjamin Muller, Bokai Yu, Marta R. Costa-Jussa, Maha Elbayad, Sravya Popuri, Paul-Ambroise Duquenne, Robin Algayres, Ruslan Mavlyutov, Itai Gat, Gabriel Synnaeve, Juan Pino, Benoit Sagot, Emmanuel Dupoux
We introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and speech.
no code implementations • 8 Oct 2023 • Robin Algayres, Pablo Diego-Simon, Benoit Sagot, Emmanuel Dupoux
Due to the absence of explicit word boundaries in the speech stream, the task of segmenting spoken sentences into word units without text supervision is particularly challenging.
no code implementations • 8 Oct 2023 • Robin Algayres, Yossi Adi, Tu Anh Nguyen, Jade Copet, Gabriel Synnaeve, Benoit Sagot, Emmanuel Dupoux
In NLP, text language models based on words or subwords are known to outperform their character-based counterparts.
no code implementations • 11 Apr 2022 • Robin Algayres, Adel Nabli, Benoit Sagot, Emmanuel Dupoux
We introduce a simple neural encoder architecture that can be trained using an unsupervised contrastive learning objective which gets its positive samples from data-augmented k-Nearest Neighbors search.
no code implementations • 30 Mar 2022 • Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoit Sagot, Abdelrahman Mohamed, Emmanuel Dupoux
We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues.
no code implementations • 11 Mar 2022 • Tu Anh Nguyen, Benoit Sagot, Emmanuel Dupoux
The approach relies first on transforming the audio into a sequence of discrete units (or pseudo-text) and then training a language model directly on such pseudo-text.
no code implementations • 27 Jul 2020 • Robin Algayres, Mohamed Salah Zaiem, Benoit Sagot, Emmanuel Dupoux
However, there is currently no clear methodology to compare or optimise the quality of these embeddings in a task-neutral way.
no code implementations • 1 May 2020 • Benjamin Muller, Benoit Sagot, Djamé Seddah
Building natural language processing systems for non standardized and low resource languages is a difficult challenge.
no code implementations • WS 2019 • Benjamin Muller, Benoit Sagot, Djam{\'e} Seddah
In this article, focusing on User Generated Content (UGC), we study the ability of BERT to perform lexical normalisation.