Search Results for author: Ewan Dunbar

Found 21 papers, 8 papers with code

The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

2 code implementations23 Nov 2020 Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Evgeny Kharitonov, Alexei Baevski, Ewan Dunbar, Emmanuel Dupoux

We introduce a new unsupervised task, spoken language modeling: the learning of linguistic representations from raw audio signals without any labels, along with the Zero Resource Speech Benchmark 2021: a suite of 4 black-box, zero-shot metrics probing for the quality of the learned models at 4 linguistic levels: phonetics, lexicon, syntax and semantics.

Clustering Language Modelling +1

RNNs implicitly implement tensor-product representations

1 code implementation ICLR 2019 R. Thomas McCoy, Tal Linzen, Ewan Dunbar, Paul Smolensky

Recurrent neural networks (RNNs) can learn continuous vector representations of symbolic structures such as sequences and sentences; these representations often exhibit linear regularities (analogies).

Representation Learning Sentence

Analogies minus analogy test: measuring regularities in word embeddings

1 code implementation CONLL 2020 Louis Fournier, Emmanuel Dupoux, Ewan Dunbar

Vector space models of words have long been claimed to capture linguistic regularities as simple vector translations, but problems have been raised with this claim.

Word Embeddings

Paraphrases do not explain word analogies

1 code implementation EACL 2021 Louis Fournier, Ewan Dunbar

Many types of distributional word embeddings (weakly) encode linguistic regularities as directions (the difference between "jump" and "jumped" will be in a similar direction to that of "walk" and "walked," and so on).

Word Embeddings

Perceptimatic: A human speech perception benchmark for unsupervised subword modelling

1 code implementation12 Oct 2020 Juliette Millet, Ewan Dunbar

In this paper, we present a data set and methods to compare speech processing models and human behaviour on a phone discrimination task.

Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages

1 code implementation4 Oct 2023 Kuan-Po Huang, Chih-Kai Yang, Yu-Kuan Fu, Ewan Dunbar, Hung-Yi Lee

We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders.

Language Modelling

Learning weakly supervised multimodal phoneme embeddings

no code implementations23 Apr 2017 Rahma Chaabouni, Ewan Dunbar, Neil Zeghidour, Emmanuel Dupoux

Recent works have explored deep architectures for learning multimodal speech representation (e. g. audio and images, articulation and audio) in a supervised way.

Multi-Task Learning

RNNs Implicitly Implement Tensor Product Representations

no code implementations20 Dec 2018 R. Thomas McCoy, Tal Linzen, Ewan Dunbar, Paul Smolensky

Recurrent neural networks (RNNs) can learn continuous vector representations of symbolic structures such as sequences and sentences; these representations often exhibit linear regularities (analogies).

Representation Learning Sentence

The Zero Resource Speech Challenge 2019: TTS without T

no code implementations25 Apr 2019 Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text).

The Perceptimatic English Benchmark for Speech Perception Models

no code implementations7 May 2020 Juliette Millet, Ewan Dunbar

We show that DeepSpeech, a standard English speech recognizer, is more specialized on English phoneme discrimination than English listeners, and is poorly correlated with their behaviour, even though it yields a low error on the decision task given to humans.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units

no code implementations12 Oct 2020 Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

We present the Zero Resource Speech Challenge 2020, which aims at learning speech representations from raw audio signals without any labels.

Speech Synthesis

The Zero Resource Speech Challenge 2021: Spoken language modelling

no code implementations29 Apr 2021 Ewan Dunbar, Mathieu Bernard, Nicolas Hamilakis, Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Eugene Kharitonov, Emmanuel Dupoux

We present the Zero Resource Speech Challenge 2021, which asks participants to learn a language model directly from audio, without any text or labels.

Language Modelling

Do self-supervised speech models develop human-like perception biases?

no code implementations ACL 2022 Juliette Millet, Ewan Dunbar

We show that the CPC model shows a small native language effect, but that wav2vec 2. 0 and HuBERT seem to develop a universal speech perception space which is not language specific.

Toward a realistic model of speech processing in the brain with self-supervised learning

no code implementations3 Jun 2022 Juliette Millet, Charlotte Caucheteux, Pierre Orhan, Yves Boubenec, Alexandre Gramfort, Ewan Dunbar, Christophe Pallier, Jean-Remi King

These elements, resulting from the largest neuroimaging benchmark to date, show how self-supervised learning can account for a rich organization of speech processing in the brain, and thus delineate a path to identify the laws of language acquisition which shape the human brain.

Language Acquisition Self-Supervised Learning

Are word boundaries useful for unsupervised language learning?

no code implementations6 Oct 2022 Tu Anh Nguyen, Maureen de Seyssel, Robin Algayres, Patricia Roze, Ewan Dunbar, Emmanuel Dupoux

However, word boundary information may be absent or unreliable in the case of speech input (word boundaries are not marked explicitly in the speech stream).

Evaluating context-invariance in unsupervised speech representations

1 code implementation27 Oct 2022 Mark Hallap, Emmanuel Dupoux, Ewan Dunbar

Unsupervised speech representations have taken off, with benchmarks (SUPERB, ZeroSpeech) demonstrating major progress on semi-supervised speech recognition, speech synthesis, and speech-only language modelling.

Language Modelling speech-recognition +2

Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge

no code implementations27 Oct 2022 Ewan Dunbar, Nicolas Hamilakis, Emmanuel Dupoux

Recent progress in self-supervised or unsupervised machine learning has opened the possibility of building a full speech processing system from raw audio without using any textual representations or expert labels such as phonemes, dictionaries or parse trees.

Acoustic Unit Discovery Language Modelling +1

Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training

1 code implementation3 Dec 2023 Sean Robertson, Ewan Dunbar

It has been generally assumed in the automatic speech recognition (ASR) literature that it is better for models to have access to wider context windows.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.