Search Results for author: Emmanuel Dupoux

Found 66 papers, 26 papers with code

Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning

no code implementations11 Apr 2022 Algayres Robin, Adel Nabli, Benoit Sagot, Emmanuel Dupoux

We introduce a simple neural encoder architecture that can be trained using an unsupervised contrastive learning objective which gets its positive samples from data-augmented k-Nearest Neighbors search.

Contrastive Learning

Probing phoneme, language and speaker information in unsupervised speech representations

no code implementations30 Mar 2022 Maureen de Seyssel, Marvin Lavechin, Yossi Adi, Emmanuel Dupoux, Guillaume Wisniewski

Language information, however, is very salient in the bilingual model only, suggesting CPC models learn to discriminate languages when trained on multiple languages.

Language Modelling

Are discrete units necessary for Spoken Language Modeling?

no code implementations11 Mar 2022 Tu Anh Nguyen, Benoit Sagot, Emmanuel Dupoux

The approach relies first on transforming the audio into a sequence of discrete units (or pseudo-text) and then training a language model directly on such pseudo-text.

Language Modelling

textless-lib: a Library for Textless Spoken Language Processing

1 code implementation15 Feb 2022 Eugene Kharitonov, Jade Copet, Kushal Lakhotia, Tu Anh Nguyen, Paden Tomasello, Ann Lee, Ali Elkahky, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

Textless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources.


Towards Interactive Language Modeling

no code implementations14 Dec 2021 Maartje ter Hoeve, Evgeny Kharitonov, Dieuwke Hupkes, Emmanuel Dupoux

As a first contribution we present a road map in which we detail the steps that need to be taken towards interactive language modeling.

Language Acquisition Language Modelling

Shennong: a Python toolbox for audio speech features extraction

1 code implementation10 Dec 2021 Mathieu Bernard, Maxime Poli, Julien Karadayi, Emmanuel Dupoux

After describing the Shennong software architecture, its core components and implemented algorithms, this paper illustrates its use on three applications: a comparison of speech features performances on a phones discrimination task, an analysis of a Vocal Tract Length Normalization model as a function of the speech duration used for training and a comparison of pitch estimation algorithms under various noise conditions.

Textless Speech Emotion Conversion using Discrete and Decomposed Representations

no code implementations14 Nov 2021 Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

We use a decomposition of the speech signal into discrete learned representations, consisting of phonetic-content units, prosodic features, speaker, and emotion.

Text-Free Prosody-Aware Generative Spoken Language Modeling

1 code implementation ACL 2022 Eugene Kharitonov, Ann Lee, Adam Polyak, Yossi Adi, Jade Copet, Kushal Lakhotia, Tu-Anh Nguyen, Morgane Rivière, Abdelrahman Mohamed, Emmanuel Dupoux, Wei-Ning Hsu

Generative Spoken Language Modeling (GSLM) \cite{Lakhotia2021} is the only prior work addressing the generative aspects of speech pre-training, which replaces text with discovered phone-like units for language modeling and shows the ability to generate meaningful novel sentences.

Language Modelling

The Zero Resource Speech Challenge 2021: Spoken language modelling

no code implementations29 Apr 2021 Ewan Dunbar, Mathieu Bernard, Nicolas Hamilakis, Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Eugene Kharitonov, Emmanuel Dupoux

We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels.

Language Modelling

Learning spectro-temporal representations of complex sounds with parameterized neural networks

1 code implementation12 Mar 2021 Rachid Riad, Julien Karadayi, Anne-Catherine Bachoud-Lévi, Emmanuel Dupoux

We found out that models based on Learnable STRFs are on par for all tasks with different toplines, and obtain the best performance for Speech Activity Detection.

Action Detection Activity Detection +1

Generative Spoken Language Modeling from Raw Audio

2 code implementations1 Feb 2021 Kushal Lakhotia, Evgeny Kharitonov, Wei-Ning Hsu, Yossi Adi, Adam Polyak, Benjamin Bolte, Tu-Anh Nguyen, Jade Copet, Alexei Baevski, Adelrahman Mohamed, Emmanuel Dupoux

We introduce Generative Spoken Language Modeling, the task of learning the acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and a set of metrics to automatically evaluate the learned representations at acoustic and linguistic levels for both encoding and generation.

Language Modelling Resynthesis

The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

1 code implementation23 Nov 2020 Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Evgeny Kharitonov, Alexei Baevski, Ewan Dunbar, Emmanuel Dupoux

We introduce a new unsupervised task, spoken language modeling: the learning of linguistic representations from raw audio signals without any labels, along with the Zero Resource Speech Benchmark 2021: a suite of 4 black-box, zero-shot metrics probing for the quality of the learned models at 4 linguistic levels: phonetics, lexicon, syntax and semantics.

Language Modelling Representation Learning

``LazImpa'': Lazy and Impatient neural agents learn to communicate efficiently

no code implementations CONLL 2020 Mathieu Rita, Rahma Chaabouni, Emmanuel Dupoux

Previous work has shown that artificial neural agents naturally develop surprisingly non-efficient codes.

Comparison of Speaker Role Recognition and Speaker Enrollment Protocol for conversational Clinical Interviews

no code implementations30 Oct 2020 Rachid Riad, Hadrien Titeux, Laurie Lemoine, Justine Montillot, Agnes Sliwinski, Jennifer Hamet Bagnou, Xuan Nga Cao, Anne-Catherine Bachoud-Lévi, Emmanuel Dupoux

Here, we proposed a split of the data that allows conducting a comparative evaluation of speaker role recognition and speaker enrollment methods to solve this task.

The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units

no code implementations12 Oct 2020 Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

We present the Zero Resource Speech Challenge 2020, which aims at learning speech representations from raw audio signals without any labels.

Speech Synthesis

Analogies minus analogy test: measuring regularities in word embeddings

1 code implementation CONLL 2020 Louis Fournier, Emmanuel Dupoux, Ewan Dunbar

Vector space models of words have long been claimed to capture linguistic regularities as simple vector translations, but problems have been raised with this claim.

Word Embeddings

"LazImpa": Lazy and Impatient neural agents learn to communicate efficiently

1 code implementation5 Oct 2020 Mathieu Rita, Rahma Chaabouni, Emmanuel Dupoux

Previous work has shown that artificial neural agents naturally develop surprisingly non-efficient codes.

Evaluating the reliability of acoustic speech embeddings

no code implementations27 Jul 2020 Robin Algayres, Mohamed Salah Zaiem, Benoit Sagot, Emmanuel Dupoux

However, there is currently no clear methodology to compare or optimise the quality of these embeddings in a task-neutral way.

Information Retrieval

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

1 code implementation2 Jul 2020 Eugene Kharitonov, Morgane Rivière, Gabriel Synnaeve, Lior Wolf, Pierre-Emmanuel Mazaré, Matthijs Douze, Emmanuel Dupoux

Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal.

Contrastive Learning Data Augmentation +1

Vocal markers from sustained phonation in Huntington's Disease

1 code implementation9 Jun 2020 Rachid Riad, Hadrien Titeux, Laurie Lemoine, Justine Montillot, Jennifer Hamet Bagnou, Xuan Nga Cao, Emmanuel Dupoux, Anne-Catherine Bachoud-Lévi

According to our regression results, Phonatory features are suitable for the predictions of clinical performance in Huntington's Disease.

An open-source voice type classifier for child-centered daylong recordings

1 code implementation26 May 2020 Marvin Lavechin, Ruben Bousbib, Hervé Bredin, Emmanuel Dupoux, Alejandrina Cristia

Spontaneous conversations in real-world settings such as those found in child-centered recordings have been shown to be amongst the most challenging audio files to process.

Language Acquisition

Occlusion resistant learning of intuitive physics from videos

no code implementations30 Apr 2020 Ronan Riochet, Josef Sivic, Ivan Laptev, Emmanuel Dupoux

In this work we propose a probabilistic formulation of learning intuitive physics in 3D scenes with significant inter-object occlusions.

Compositionality and Generalization in Emergent Languages

1 code implementation ACL 2020 Rahma Chaabouni, Eugene Kharitonov, Diane Bouchacourt, Emmanuel Dupoux, Marco Baroni

Third, while compositionality is not necessary for generalization, it provides an advantage in terms of language transmission: The more compositional a language is, the more easily it will be picked up by new learners, even when the latter differ in architecture from the original agents.


Identification of primary and collateral tracks in stuttered speech

no code implementations LREC 2020 Rachid Riad, Anne-Catherine Bachoud-Lévi, Frank Rudzicz, Emmanuel Dupoux

Here, we introduce a new evaluation framework for disfluency detection inspired by the clinical and NLP perspective together with the theory of performance from \cite{clark1996using} which distinguishes between primary and collateral tracks.

Unsupervised pretraining transfers well across languages

2 code implementations7 Feb 2020 Morgane Rivière, Armand Joulin, Pierre-Emmanuel Mazaré, Emmanuel Dupoux

Cross-lingual and multi-lingual training of Automatic Speech Recognition (ASR) has been extensively investigated in the supervised setting.

Automatic Speech Recognition

Libri-Light: A Benchmark for ASR with Limited or No Supervision

1 code implementation17 Dec 2019 Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdel-rahman Mohamed, Emmanuel Dupoux

Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER).

 Ranked #1 on Speech Recognition on Libri-Light test-other (ABX-across metric)

Speech Recognition

Neural language modeling of free word order argument structure

no code implementations30 Nov 2019 Charlotte Rochereau, Benoît Sagot, Emmanuel Dupoux

Neural language models trained with a predictive or masked objective have proven successful at capturing short and long distance syntactic dependencies.

Language Modelling

Word-order biases in deep-agent emergent communication

1 code implementation ACL 2019 Rahma Chaabouni, Eugene Kharitonov, Alessandro Lazaric, Emmanuel Dupoux, Marco Baroni

We train models to communicate about paths in a simple gridworld, using miniature languages that reflect or violate various natural language trends, such as the tendency to avoid redundancy or to minimize long-distance dependencies.

Anti-efficient encoding in emergent communication

1 code implementation NeurIPS 2019 Rahma Chaabouni, Eugene Kharitonov, Emmanuel Dupoux, Marco Baroni

Despite renewed interest in emergent language simulations with neural networks, little is known about the basic properties of the induced code, and how they compare to human language.

The Zero Resource Speech Challenge 2019: TTS without T

no code implementations25 Apr 2019 Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text).

End-to-End Speech Recognition From the Raw Waveform

1 code implementation19 Jun 2018 Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux

In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture.

Speech Recognition

Sampling strategies in Siamese Networks for unsupervised speech representation learning

2 code implementations30 Apr 2018 Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz, Emmanuel Dupoux

We apply these results to pairs of words discovered using an unsupervised algorithm and show an improvement on state-of-the-art in unsupervised representation learning using siamese networks.

Representation Learning

IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning

1 code implementation20 Mar 2018 Ronan Riochet, Mario Ynocente Castro, Mathieu Bernard, Adam Lerer, Rob Fergus, Véronique Izard, Emmanuel Dupoux

In order to reach human performance on complexvisual tasks, artificial systems need to incorporate a sig-nificant amount of understanding of the world in termsof macroscopic objects, movements, forces, etc.


Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

no code implementations23 Dec 2017 Adriana Guevara-Rukoz, Alejandrina Cristia, Bogdan Ludusan, Roland Thiollière, Andrew Martin, Reiko Mazuka, Emmanuel Dupoux

At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS.

Learning Filterbanks from Raw Speech for Phone Recognition

2 code implementations3 Nov 2017 Neil Zeghidour, Nicolas Usunier, Iasonas Kokkinos, Thomas Schatz, Gabriel Synnaeve, Emmanuel Dupoux

We train a bank of complex filters that operates on the raw waveform and is fed into a convolutional neural network for end-to-end phone recognition.

Learning weakly supervised multimodal phoneme embeddings

no code implementations23 Apr 2017 Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux

Recent works have explored deep architectures for learning multimodal speech representation (e. g. audio and images, articulation and audio) in a supervised way.

Multi-Task Learning

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

5 code implementations TACL 2016 Tal Linzen, Emmanuel Dupoux, Yoav Goldberg

The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities.

Language Modelling

Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner

no code implementations29 Jul 2016 Emmanuel Dupoux

The project of 'reverse engineering' language development, i. e., of building an effective system that mimics infant's achievements appears therefore to be within reach.

Weakly Supervised Multi-Embeddings Learning of Acoustic Models

no code implementations20 Dec 2014 Gabriel Synnaeve, Emmanuel Dupoux

We trained a Siamese network with multi-task same/different information on a speech dataset, and found that it was possible to share a network for both tasks without a loss in performance.

Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems

no code implementations LREC 2014 Bogdan Ludusan, Maarten Versteegh, Aren Jansen, Guillaume Gravier, Xuan-Nga Cao, Mark Johnson, Emmanuel Dupoux

The unsupervised discovery of linguistic terms from either continuous phoneme transcriptions or from raw speech has seen an increasing interest in the past years both from a theoretical and a practical standpoint.

Language Acquisition

Cannot find the paper you are looking for? You can Submit a new open access paper.