Search Results for author: Yannick Estève

Found 42 papers, 7 papers with code

Findings of the IWSLT 2022 Evaluation Campaign

no code implementations IWSLT (ACL) 2022 Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe

The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation.

Speech-to-Speech Translation Translation

Towards Early Prediction of Self-Supervised Speech Model Performance

no code implementations10 Jan 2025 Ryan Whetten, Lucas Maison, Titouan Parcollet, Marco Dinarelli, Yannick Estève

Results show that measures of cluster quality and rank correlate better with downstream performance than the pre-training loss with only one hour of unlabeled audio, reducing the need for GPU hours and labeled data in SSL model evaluation.

GPU Self-Supervised Learning

Automatic Voice Identification after Speech Resynthesis using PPG

no code implementations5 Aug 2024 Thibault Gaudier, Marie Tahon, Anthony Larcher, Yannick Estève

Speech resynthesis is a generic task for which we want to synthesize audio with another audio as input, which finds applications for media monitors and journalists. Among different tasks addressed by speech resynthesis, voice conversion preserves the linguistic information while modifying the identity of the speaker, and speech edition preserves the identity of the speaker but some words are modified. In both cases, we need to disentangle speaker and phonetic contents in intermediate representations. Phonetic PosteriorGrams (PPG) are a frame-level probabilistic representation of phonemes, and are usually considered speaker-independent. This paper presents a PPG-based speech resynthesis system. A perceptive evaluation assesses that it produces correct audio quality. Then, we demonstrate that an automatic speaker verification model is not able to recover the source speaker after re-synthesis with PPG, even when the model is trained on synthetic data.

Resynthesis Speaker Verification +1

Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation

no code implementations8 Jul 2024 Jarod Duret, Yannick Estève, Titouan Parcollet

Recent advancements in textless speech-to-speech translation systems have been driven by the adoption of self-supervised learning techniques.

Automatic Speech Recognition Emotion Recognition +9

Performance Analysis of Speech Encoders for Low-Resource SLU and ASR in Tunisian Dialect

1 code implementation5 Jul 2024 Salima Mdhaffar, Haroun Elleuch, Fethi Bougares, Yannick Estève

In contrast to existing research, this paper contributes by comparing the effectiveness of SSL approaches in the context of (i) the low-resource spoken Tunisian Arabic dialect and (ii) its combination with a low-resource SLU and ASR scenario, where only a few semantic annotations are available for fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets

no code implementations19 Jun 2024 Lucas Druart, Valentin Vielzeuf, Yannick Estève

In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users' requests is key to a smooth interaction.

Dialogue Understanding Language Modeling +2

Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants

no code implementations14 May 2024 Chloé Sekkat, Fanny Leroy, Salima Mdhaffar, Blake Perry Smith, Yannick Estève, Joseph Dureau, Alice Coucke

Recent works demonstrate that voice assistants do not perform equally well for everyone, but research on demographic robustness of speech technologies is still scarce.

Automatic Speech Recognition Diversity +3

Open Implementation and Study of BEST-RQ for Speech Processing

1 code implementation7 May 2024 Ryan Whetten, Titouan Parcollet, Marco Dinarelli, Yannick Estève

BERT-based Speech pre-Training with Random-projection Quantizer (BEST-RQ), is an SSL method that has shown great performance on Automatic Speech Recognition (ASR) while being simpler than other SSL methods, such as wav2vec 2. 0.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Is one brick enough to break the wall of spoken dialogue state tracking?

no code implementations3 Nov 2023 Lucas Druart, Valentin Vielzeuf, Yannick Estève

In Task-Oriented Dialogue (TOD) systems, correctly updating the system's understanding of the user's requests (\textit{a. k. a} dialogue state tracking) is key to a smooth interaction.

Dialogue State Tracking

Enhancing expressivity transfer in textless speech-to-speech translation

no code implementations11 Oct 2023 Jarod Duret, Benjamin O'Brien, Yannick Estève, Titouan Parcollet

Textless speech-to-speech translation systems are rapidly advancing, thanks to the integration of self-supervised learning techniques.

Self-Supervised Learning Speech-to-Speech Translation +1

Acoustic and linguistic representations for speech continuous emotion recognition in call center conversations

no code implementations6 Oct 2023 Manon Macary, Marie Tahon, Yannick Estève, Daniel Luzzati

In the context of telephone conversations, we can break down the audio information into acoustic and linguistic by using the speech signal and its transcription.

Emotion Recognition Transfer Learning

Semantic enrichment towards efficient speech representations

no code implementations3 Jul 2023 Gaëlle Laperrière, Ha Nguyen, Sahar Ghannay, Bassam Jabaian, Yannick Estève

Over the past few years, self-supervised learned speech representations have emerged as fruitful replacements for conventional surface representations when solving Spoken Language Understanding (SLU) tasks.

Spoken Language Understanding

Some voices are too common: Building fair speech recognition systems using the Common Voice dataset

no code implementations1 Jun 2023 Lucas Maison, Yannick Estève

Automatic speech recognition (ASR) systems become increasingly efficient thanks to new advances in neural network training like self-supervised learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

OLISIA: a Cascade System for Spoken Dialogue State Tracking

1 code implementation20 Apr 2023 Léo Jacqmin, Lucas Druart, Yannick Estève, Benoît Favre, Lina Maria Rojas-Barahona, Valentin Vielzeuf

Though Dialogue State Tracking (DST) is a core component of spoken dialogue systems, recent work on this task mostly deals with chat corpora, disregarding the discrepancies between spoken and written language. In this paper, we propose OLISIA, a cascade system which integrates an Automatic Speech Recognition (ASR) model and a DST model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Improving Accented Speech Recognition with Multi-Domain Training

no code implementations14 Mar 2023 Lucas Maison, Yannick Estève

Thanks to the rise of self-supervised learning, automatic speech recognition (ASR) systems now achieve near-human performance on a wide variety of datasets.

Accented Speech Recognition Automatic Speech Recognition +3

Federated Learning for ASR based on Wav2vec 2.0

2 code implementations20 Feb 2023 Tuan Nguyen, Salima Mdhaffar, Natalia Tomashenko, Jean-François Bonastre, Yannick Estève

This paper presents a study on the use of federated learning to train an ASR model based on a wav2vec 2. 0 model pre-trained by self supervision.

Federated Learning Language Modeling +1

A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

no code implementations4 Apr 2022 Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève

These models are pre-trained on unlabeled audio data and then used in speech processing downstream tasks such as automatic speech recognition (ASR) or speech translation (ST).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition

no code implementations7 Nov 2021 Salima Mdhaffar, Jean-François Bonastre, Marc Tommasi, Natalia Tomashenko, Yannick Estève

The widespread of powerful personal devices capable of collecting voice of their users has opened the opportunity to build speaker adapted speech recognition system (ASR) or to participate to collaborative learning of ASR.

Speaker Verification speech-recognition +1

Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

no code implementations6 Nov 2021 Natalia Tomashenko, Salima Mdhaffar, Marc Tommasi, Yannick Estève, Jean-François Bonastre

This paper investigates methods to effectively retrieve speaker information from the personalized speaker adapted neural network acoustic models (AMs) in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Where are we in semantic concept extraction for Spoken Language Understanding?

no code implementations24 Jun 2021 Sahar Ghannay, Antoine Caubrière, Salima Mdhaffar, Gaëlle Laperrière, Bassam Jabaian, Yannick Estève

More recent works on self-supervised training with unlabeled data open new perspectives in term of performance for automatic speech recognition and natural language processing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation

no code implementations29 Apr 2021 Ha Nguyen, Yannick Estève, Laurent Besacier

Boosted by the simultaneous translation shared task at IWSLT 2020, promising end-to-end online speech translation approaches were recently proposed.

Translation

End2End Acoustic to Semantic Transduction

no code implementations1 Feb 2021 Valentin Pelloin, Nathalie Camelin, Antoine Laurent, Renato de Mori, Antoine Caubrière, Yannick Estève, Sylvain Meignier

In this paper, we propose a novel end-to-end sequence-to-sequence spoken language understanding model using an attention mechanism.

Language Modeling Language Modelling +1

On the use of Self-supervised Pre-trained Acoustic and Linguistic Features for Continuous Speech Emotion Recognition

no code implementations18 Nov 2020 Manon Macary, Marie Tahon, Yannick Estève, Anthony Rousseau

Pre-training for feature extraction is an increasingly studied approach to get better continuous representations of audio and text content.

Speech Emotion Recognition

Leverage Unlabeled Data for Abstractive Speech Summarization with Self-Supervised Learning and Back-Summarization

no code implementations30 Jul 2020 Paul Tardy, Louis de Seynes, François Hernandez, Vincent Nguyen, David Janiszek, Yannick Estève

In order to build a corpus for this task, it is necessary to obtain the (automatic or manual) transcription of each meeting, and then to segment and align it with the corresponding manual report to produce training examples suitable for training.

Abstractive Text Summarization Decoder +3

Align then Summarize: Automatic Alignment Methods for Summarization Corpus Creation

2 code implementations LREC 2020 Paul Tardy, David Janiszek, Yannick Estève, Vincent Nguyen

We report automatic alignment and summarization performances on this corpus and show that automatic alignment is relevant for data annotation since it leads to large improvement of almost +4 on all ROUGE scores on the summarization task.

Articles Meeting Summarization +1

ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

no code implementations WS 2020 Maha Elbayad, Ha Nguyen, Fethi Bougares, Natalia Tomashenko, Antoine Caubrière, Benjamin Lecouteux, Yannick Estève, Laurent Besacier

This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation.

Data Augmentation Decoder +2

End-to-end named entity extraction from speech

no code implementations30 May 2018 Sahar Ghannay, Antoine Caubrière, Yannick Estève, Antoine Laurent, Emmanuel Morin

Until now, NER from speech is made through a pipeline process that consists in processing first an automatic speech recognition (ASR) on the audio and then processing a NER on the ASR outputs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation

3 code implementations12 May 2018 François Hernandez, Vincent Nguyen, Sahar Ghannay, Natalia Tomashenko, Yannick Estève

We present the recent development on Automatic Speech Recognition (ASR) systems in comparison with the two previous releases of the TED-LIUM Corpus from 2012 and 2014.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

ASR error management for improving spoken language understanding

no code implementations26 May 2017 Edwin Simonnet, Sahar Ghannay, Nathalie Camelin, Yannick Estève, Renato de Mori

This paper addresses the problem of automatic speech recognition (ASR) error detection and their use for improving spoken language understanding (SLU) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Cannot find the paper you are looking for? You can Submit a new open access paper.