no code implementations • LREC 2022 • Tugtekin Turan, Dietrich Klakow, Emmanuel Vincent, Denis Jouvet
In recent years, voice-controlled personal assistants have revolutionized the interaction with smart devices and mobile applications.
no code implementations • LREC 2022 • Imran Sheikh, Emmanuel Vincent, Irina Illina
Training of LSTM LMs in such limited data scenarios can benefit from alternate uncertain ASR hypotheses, as observed in our recent work.
no code implementations • 22 Dec 2024 • Natalia Tomashenko, Emmanuel Vincent, Marc Tommasi
In this paper, we investigate the impact of speech temporal dynamics in application to automatic speaker verification and speaker voice anonymization tasks.
no code implementations • 29 Oct 2024 • Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent
Distant-microphone meeting transcription is a challenging task.
1 code implementation • 9 Oct 2024 • Natalia Tomashenko, Xiaoxiao Miao, Emmanuel Vincent, Junichi Yamagishi
Results will be presented at the ICASSP 2025 special session to which 5 selected top-ranked participants will be invited to submit and present their challenge systems.
no code implementations • 16 Jul 2024 • Michele Panariello, Natalia Tomashenko, Xin Wang, Xiaoxiao Miao, Pierre Champion, Hubert Nourtel, Massimiliano Todisco, Nicholas Evans, Emmanuel Vincent, Junichi Yamagishi
The VoicePrivacy Challenge promotes the development of voice anonymisation solutions for speech technology.
1 code implementation • 3 Apr 2024 • Natalia Tomashenko, Xiaoxiao Miao, Pierre Champion, Sarina Meyer, Xin Wang, Emmanuel Vincent, Michele Panariello, Nicholas Evans, Junichi Yamagishi, Massimiliano Todisco
The task of the challenge is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content and emotional states.
no code implementations • 11 Mar 2024 • Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent
Past studies on end-to-end meeting transcription have focused on model architecture and have mostly been evaluated on simulated meeting data.
no code implementations • 29 Nov 2023 • Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent
We propose two approaches to train an end-to-end joint punctuated and normalized ASR system using limited punctuated data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 16 Oct 2023 • Can Cui, Imran Ahamad Sheikh, Mostafa Sadeghi, Emmanuel Vincent
We present an end-to-end multichannel speaker-attributed automatic speech recognition (MC-SA-ASR) system that combines a Conformer-based encoder with multi-frame crosschannel attention and a speaker-attributed Transformer-based decoder.
1 code implementation • 28 May 2023 • Sewade Ogun, Vincent Colotte, Emmanuel Vincent
Flow-based generative models are widely used in text-to-speech (TTS) systems to learn the distribution of audio features (e. g., Mel-spectrograms) given the input tokens and to sample from this distribution to generate diverse utterances.
1 code implementation • 12 Oct 2022 • Sewade Ogun, Vincent Colotte, Emmanuel Vincent
We show the viability of this approach for training a multi-speaker GlowTTS model on the Common Voice English dataset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 14 May 2022 • Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco
The VoicePrivacy Challenge aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges.
2 code implementations • 23 Mar 2022 • Natalia Tomashenko, Xin Wang, Xiaoxiao Miao, Hubert Nourtel, Pierre Champion, Massimiliano Todisco, Emmanuel Vincent, Nicholas Evans, Junichi Yamagishi, Jean-François Bonastre
Participants apply their developed anonymization systems, run evaluation scripts and submit objective evaluation results and anonymized speech data to the organizers.
no code implementations • 23 Feb 2022 • Ali Shahin Shamsabadi, Brij Mohan Lal Srivastava, Aurélien Bellet, Nathalie Vauquier, Emmanuel Vincent, Mohamed Maouche, Marc Tommasi, Nicolas Papernot
We remove speaker information from these attributes by introducing differentially private feature extractors based on an autoencoder and an automatic speech recognizer, respectively, trained using noise layers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 1 Sep 2021 • Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Jose Patino, Brij Mohan Lal Srivastava, Paul-Gauthier Noé, Andreas Nautsch, Nicholas Evans, Junichi Yamagishi, Benjamin O'Brien, Anaïs Chanclu, Jean-François Bonastre, Massimiliano Todisco, Mohamed Maouche
We provide a systematic overview of the challenge design with an analysis of submitted systems and evaluation results.
1 code implementation • 29 Jul 2021 • Prerak Srivastava, Antoine Deleforge, Emmanuel Vincent
Knowing the geometrical and acoustical parameters of a room may benefit applications such as audio augmented reality, speech dereverberation or audio forensics.
no code implementations • 26 Jul 2020 • Md Sahidullah, Achintya Kumar Sarkar, Ville Vestman, Xuechen Liu, Romain Serizel, Tomi Kinnunen, Zheng-Hua Tan, Emmanuel Vincent
Our primary submission to the challenge is the fusion of seven subsystems which yields a normalized minimum detection cost function (minDCF) of 0. 072 and an equal error rate (EER) of 2. 14% on the evaluation set.
5 code implementations • 22 May 2020 • Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent
Most deep learning-based speech separation models today are benchmarked on it.
Audio and Speech Processing
no code implementations • 18 May 2020 • Brij Mohan Lal Srivastava, Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi
The recently proposed x-vector based anonymization scheme converts any input voice into that of a random pseudo-speaker.
no code implementations • 11 May 2020 • Michel Olvera, Emmanuel Vincent, Romain Serizel, Gilles Gasso
Ambient sound scenes typically comprise multiple short events occurring on top of a somewhat stationary background.
3 code implementations • 4 May 2020 • Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco
The VoicePrivacy initiative aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges.
no code implementations • 20 Apr 2020 • Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, Neville Ryant
Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6).
1 code implementation • 5 Feb 2020 • Nicolas Turpault, Romain Serizel, Emmanuel Vincent
Many datasets and approaches in ambient sound analysis use weakly labeled data. Weak labels are employed because annotating every data sample with a strong label is too expensive. Yet, their impact on the performance in comparison to strong labels remains unclear. Indeed, weak labels must often be dealt with at the same time as other challenges, namely multiple labels per sample, unbalanced classes and/or overlapping events. In this paper, we formulate a supervised learning problem which involves weak labels. We create a dataset that focuses on the difference between strong and weak labels as opposed to other challenges.
no code implementations • 20 Nov 2019 • Guillaume Carbajal, Romain Serizel, Emmanuel Vincent, Eric Humbert
We consider the problem of simultaneous reduction of acoustic echo, reverberation and noise.
no code implementations • 12 Nov 2019 • Brij Mohan Lal Srivastava, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent
In this paper, we focus on the protection of speaker identity and study the extent to which users can be recognized based on the encoded representation of their speech as obtained by a deep encoder-decoder architecture trained for ASR.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 10 Nov 2019 • Brij Mohan Lal Srivastava, Nathalie Vauquier, Md Sahidullah, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent
In this paper, we investigate anonymization methods based on voice conversion.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 6 Nov 2019 • Md Sahidullah, Jose Patino, Samuele Cornell, Ruiqing Yin, Sunit Sivasankaran, Hervé Bredin, Pavel Korshunov, Alessio Brutti, Romain Serizel, Emmanuel Vincent, Nicholas Evans, Sébastien Marcel, Stefano Squartini, Claude Barras
This paper describes the speaker diarization systems developed for the Second DIHARD Speech Diarization Challenge (DIHARD II) by the Speed team.
2 code implementations • 23 Oct 2019 • Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent
Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions.
no code implementations • 16 Oct 2019 • Adrien Dufraux, Emmanuel Vincent, Awni Hannun, Armelle Brun, Matthijs Douze
The transcriptions used to train an Automatic Speech Recognition (ASR) system may contain errors.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • SEMEVAL 2019 • Johannes Kiesel, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, Martin Potthast
Hyperpartisan news is news that takes an extreme left-wing or right-wing standpoint.
no code implementations • 10 May 2019 • Giuseppe Amato, Malte Behrmann, Frédéric Bimbot, Baptiste Caramiaux, Fabrizio Falchi, Ander Garcia, Joost Geurts, Jaume Gibert, Guillaume Gravier, Hadmut Holken, Hartmut Koenitz, Sylvain Lefebvre, Antoine Liutkus, Fabien Lotte, Andrew Perkis, Rafael Redondo, Enrico Turrin, Thierry Vieville, Emmanuel Vincent
Thanks to the Big Data revolution and increasing computing capacities, Artificial Intelligence (AI) has made an impressive revival over the past few years and is now omnipresent in both research and industry.
no code implementations • 3 May 2019 • Manuel Pariente, Antoine Deleforge, Emmanuel Vincent
Recent studies have explored the use of deep generative models of speech spectra based of variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement.
no code implementations • 15 Feb 2019 • Dayana Ribas, Emmanuel Vincent
So far, different uncertainty propagation methods have been proposed to compensate noise and reverberation in i-vectors in the context of speaker recognition.
no code implementations • 28 Mar 2018 • Jon Barker, Shinji Watanabe, Emmanuel Vincent, Jan Trmal
The CHiME challenge series aims to advance robust automatic speech recognition (ASR) technology by promoting research at the interface of speech and language processing, signal processing , and machine learning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
1 code implementation • 1 Jul 2017 • Ziteng Wang, Emmanuel Vincent, Romain Serizel, Yonghong Yan
Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and the Generalized Eigenvalue (GEV) beamformer are popular signal processing techniques which can improve speech recognition performance.