no code implementations • 29 Feb 2024 • Sonal Joshi, Thomas Thebaud, Jesús Villalba, Najim Dehak
In this paper, we propose a method to detect the presence of adversarial examples, i. e., a binary classifier distinguishing between benign and adversarial examples.
no code implementations • 8 Sep 2023 • Saurabhchand Bhati, Jesús Villalba, Laureano Moro-Velazquez, Thomas Thebaud, Najim Dehak
Cascaded SpeechCLIP attempted to generate localized word-level information and utilize both the pretrained image and text encoders.
no code implementations • 12 Apr 2023 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak
These representations significantly reduce the amount of labeled data needed for downstream task performance, such as automatic speech recognition.
no code implementations • 7 Mar 2023 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak
Speech super-resolution/Bandwidth Extension (BWE) can improve downstream tasks like Automatic Speaker Verification (ASV).
no code implementations • 4 Sep 2022 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Piotr Żelasko, Najim Dehak
We show that our bandwidth extension leads to phenomena such as a shift of telephone (test) embeddings towards wideband (train) signals, a negative correlation of perceptual quality with downstream performance, and condition-independent score calibration.
1 code implementation • 10 Aug 2022 • Jaejin Cho, Raghavendra Pappagari, Piotr Żelasko, Laureano Moro-Velazquez, Jesús Villalba, Najim Dehak
This paper applies a non-contrastive self-supervised learning method on an unlabeled speech corpus to learn utterance-level embeddings.
no code implementations • 30 Mar 2022 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Najim Dehak
Then, we propose a two-stage learning solution where we use a pre-trained domain adaptation system for pre-processing in bandwidth extension training.
no code implementations • 5 Oct 2021 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak
We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework to model the signal structure at a higher level, e. g., phone level.
no code implementations • 13 Sep 2021 • Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Laureano Moro-Velazquez, Najim Dehak
While most of the current approaches focus on inferring emotion from isolated utterances, we argue that this is not sufficient to achieve conversational emotion recognition (CER) which deals with recognizing emotions in conversations.
no code implementations • 9 Jul 2021 • Jesús Villalba, Sonal Joshi, Piotr Żelasko, Najim Dehak
Also, representations trained to classify attacks against speaker identification can be used also to classify attacks against speaker verification and speech recognition.
no code implementations • 3 Jun 2021 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak
We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework that can model the signal structure at a higher level e. g. at the phoneme level.
no code implementations • 3 Apr 2021 • Saurabh Kataria, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak
We investigate it for adapt microphone speech to the telephone domain.
no code implementations • 22 Jan 2021 • Sonal Joshi, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak
Such attacks pose severe security risks, making it vital to deep-dive and understand how much the state-of-the-art SR systems are vulnerable to these attacks.
no code implementations • 2 Nov 2020 • Nanxin Chen, Piotr Żelasko, Jesús Villalba, Najim Dehak
This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training.
no code implementations • 27 Oct 2020 • Raghavendra Pappagari, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak
Data augmentation is a widely used strategy for training robust machine learning models.
no code implementations • 26 Jul 2020 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Najim Dehak
We perform segmentation based on the assumption that the frame feature vectors are more similar within a segment than across the segments.
no code implementations • 10 Nov 2019 • Nanxin Chen, Shinji Watanabe, Jesús Villalba, Najim Dehak
In this paper, we study two different non-autoregressive transformer structure for automatic speech recognition (ASR): A-CMLM and A-FMLM.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 25 Oct 2019 • Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Najim Dehak
We experiment with two adaptation tasks: microphone to telephone and a novel reverberant to clean adaptation with the end goal of improving speaker recognition performance.
Audio and Speech Processing Sound
1 code implementation • 25 Oct 2019 • Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, Nanxin Chen, Paola García, Najim Dehak
On BabyTrain corpus, we observe relative gains of 10. 38% and 12. 40% in minDCF and EER respectively.
1 code implementation • 25 Oct 2019 • Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Paola García-Perera, Najim Dehak
The approach yielded significant improvements on both real and simulated sets when data augmentation was not used in speaker verification pipeline or augmentation was used only during x-vector training.
Audio and Speech Processing Sound
3 code implementations • 23 Oct 2019 • Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Yishay Carmiel, Najim Dehak
BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm.
1 code implementation • 1 Apr 2019 • Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak
We present JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT).
no code implementations • 20 Nov 2015 • Jesús Villalba
This can be used to adapt SPLDA from one database to another with few development data or to implement the fully Bayesian recipe.
no code implementations • 20 Nov 2015 • Jesús Villalba
In this document we are going to derive the equations needed to implement a Variational Bayes i-vector extractor.
no code implementations • 20 Nov 2015 • Jesús Villalba
We describe a generative model that produces both sets of data where the unknown labels are modeled like latent variables.
no code implementations • 20 Nov 2015 • Jesús Villalba
This model was applied in the paper "Handling Recordings Acquired Simultaneously over Multiple Channels with PLDA" published at Interspeech 2013.