no code implementations • 7 Mar 2023 • Martin Sustek, Samik Sadhu, Lukas Burget, Hynek Hermansky, Jesus Villalba, Laureano Moro-Velazquez, Najim Dehak
The JEM training relies on "positive examples" (i. e. examples from the training data set) as well as on "negative examples", which are samples from the modeled distribution $p(x)$ generated by means of Stochastic Gradient Langevin Dynamics (SGLD).
no code implementations • 7 Mar 2023 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak
Speech super-resolution/Bandwidth Extension (BWE) can improve downstream tasks like Automatic Speaker Verification (ASV).
no code implementations • 4 Sep 2022 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Piotr Żelasko, Najim Dehak
We show that our bandwidth extension leads to phenomena such as a shift of telephone (test) embeddings towards wideband (train) signals, a negative correlation of perceptual quality with downstream performance, and condition-independent score calibration.
no code implementations • 10 Aug 2022 • Jaejin Cho, Jes'us Villalba, Laureano Moro-Velazquez, Najim Dehak
In recent studies, self-supervised pre-trained models tend to outperform supervised pre-trained models in transfer learning.
1 code implementation • 10 Aug 2022 • Jaejin Cho, Raghavendra Pappagari, Piotr Żelasko, Laureano Moro-Velazquez, Jesús Villalba, Najim Dehak
This paper applies a non-contrastive self-supervised learning method on an unlabeled speech corpus to learn utterance-level embeddings.
no code implementations • 8 Apr 2022 • Sonal Joshi, Saurabh Kataria, Jesus Villalba, Najim Dehak
Building on our previous work that used representation learning to classify and detect adversarial attacks, we propose an improvement to it using AdvEst, a method to estimate adversarial perturbation.
no code implementations • 8 Apr 2022 • Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak
We propose three defenses--denoiser pre-processor, adversarially fine-tuning ASR model, and adversarially fine-tuning joint model of ASR and denoiser.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 30 Mar 2022 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Najim Dehak
Then, we propose a two-stage learning solution where we use a pre-trained domain adaptation system for pre-processing in bandwidth extension training.
1 code implementation • 26 Jan 2022 • Piotr Żelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak
In this paper, we 1) investigate the influence of different factors (i. e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 7 Jan 2022 • Amir Hussein, Shammur Absar Chowdhury, Ahmed Abdelali, Najim Dehak, Ahmed Ali, Sanjeev Khudanpur
The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language.
no code implementations • 5 Oct 2021 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak
We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework to model the signal structure at a higher level, e. g., phone level.
no code implementations • 28 Sep 2021 • Jejin Cho, Jesus Villalba, Najim Dehak
This technical report describes Johns Hopkins University speaker recognition system submitted to Voxceleb Speaker Recognition Challenge 2021 Track 3: Self-supervised speaker verification (closed).
no code implementations • 13 Sep 2021 • Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Laureano Moro-Velazquez, Najim Dehak
While most of the current approaches focus on inferring emotion from isolated utterances, we argue that this is not sufficient to achieve conversational emotion recognition (CER) which deals with recognizing emotions in conversations.
no code implementations • 13 Sep 2021 • Raghavendra Pappagari, Piotr Żelasko, Agnieszka Mikołajczyk, Piotr Pęzik, Najim Dehak
Further, we show that by training the model in the written text domain and then transfer learning to conversations, we can achieve reasonable performance with less data.
no code implementations • 9 Jul 2021 • Jesús Villalba, Sonal Joshi, Piotr Żelasko, Najim Dehak
Also, representations trained to classify attacks against speaker identification can be used also to classify attacks against speaker verification and speech recognition.
1 code implementation • 5 Jul 2021 • Piotr Żelasko, Raghavendra Pappagari, Najim Dehak
Dialog acts can be interpreted as the atomic units of a conversation, more fine-grained than utterances, characterized by a specific communicative function.
2 code implementations • 17 Jun 2021 • Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan
The model takes an input phoneme sequence, and through an iterative refinement process, generates an audio waveform.
no code implementations • 10 Jun 2021 • Amir Hussein, Shammur Chowdhury, Najim Dehak, Ahmed Ali
In this paper, we exploit the transfer learning approach to design End-to-End (E2E) CS ASR systems for the two low-resourced language pairs using different monolingual speech data and a small set of noisy CS data.
no code implementations • 3 Jun 2021 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak
We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework that can model the signal structure at a higher level e. g. at the phoneme level.
no code implementations • 3 Apr 2021 • Saurabh Kataria, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak
We investigate it for adapt microphone speech to the telephone domain.
no code implementations • 31 Mar 2021 • Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur
We investigate two threat models: a denial-of-service scenario where fast gradient-sign method (FGSM) or weak projected gradient descent (PGD) attacks are used to degrade the model's word error rate (WER); and a targeted scenario where a more potent imperceptible attack forces the system to recognize a specific phrase.
no code implementations • 22 Jan 2021 • Sonal Joshi, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak
Such attacks pose severe security risks, making it vital to deep-dive and understand how much the state-of-the-art SR systems are vulnerable to these attacks.
no code implementations • 2 Nov 2020 • Nanxin Chen, Piotr Żelasko, Jesús Villalba, Najim Dehak
This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training.
no code implementations • 27 Oct 2020 • Raghavendra Pappagari, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak
Data augmentation is a widely used strategy for training robust machine learning models.
1 code implementation • 22 Oct 2020 • Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak
Furthermore, we find that a multilingual LM hurts a multilingual ASR system's performance, and retaining only the target language's phonotactic data in LM training is preferable.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 21 Oct 2020 • Jaejin Cho, Piotr Zelasko, Jesus Villalba, Shinji Watanabe, Najim Dehak
TTS with speaker classification loss improved EER by 0. 28\% and 0. 73\% absolutely from a model using only speaker classification loss in LibriTTS and Voxceleb1 respectively.
no code implementations • 26 Jul 2020 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Najim Dehak
We perform segmentation based on the assumption that the frame feature vectors are more similar within a segment than across the segments.
no code implementations • 16 May 2020 • Piotr Żelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak
Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 13 Apr 2020 • Łukasz Augustyniak, Piotr Szymanski, Mikołaj Morzy, Piotr .Zelasko, Adrian Szymczak, Jan Mizgajski, Yishay Carmiel, Najim Dehak
Automatic Speech Recognition (ASR) systems introduce word errors, which often confuse punctuation prediction models, turning punctuation restoration into a challenging task.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 12 Feb 2020 • Raghavendra Pappagari, Tianzi Wang, Jesus Villalba, Nanxin Chen, Najim Dehak
Then, we show the effect of emotion on speaker recognition.
1 code implementation • 2 Dec 2019 • Paola Garcia, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim Dehak
This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios.
Audio and Speech Processing Sound
no code implementations • 10 Nov 2019 • Nanxin Chen, Shinji Watanabe, Jesús Villalba, Najim Dehak
In this paper, we study two different non-autoregressive transformer structure for automatic speech recognition (ASR): A-CMLM and A-FMLM.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 25 Oct 2019 • Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Paola García-Perera, Najim Dehak
The approach yielded significant improvements on both real and simulated sets when data augmentation was not used in speaker verification pipeline or augmentation was used only during x-vector training.
Audio and Speech Processing Sound
1 code implementation • 25 Oct 2019 • Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, Nanxin Chen, Paola García, Najim Dehak
On BabyTrain corpus, we observe relative gains of 10. 38% and 12. 40% in minDCF and EER respectively.
1 code implementation • 25 Oct 2019 • Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Najim Dehak
We experiment with two adaptation tasks: microphone to telephone and a novel reverberant to clean adaptation with the end goal of improving speaker recognition performance.
Audio and Speech Processing Sound
3 code implementations • 23 Oct 2019 • Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Yishay Carmiel, Najim Dehak
BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm.
3 code implementations • 9 Jun 2019 • Zheng-Hua Tan, Achintya kr. Sarkar, Najim Dehak
In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice activity.
no code implementations • 26 Apr 2019 • Mohammed Senoussaoui, Patrick Cardinal, Najim Dehak, Alessandro Lameiras Koerich
Automatic measuring of speaker sincerity degree is a novel research problem in computational paralinguistics.
1 code implementation • 1 Apr 2019 • Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak
We present JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT).
no code implementations • 10 Dec 2018 • Matthew Wiesner, Adithya Renduchintala, Shinji Watanabe, Chunxi Liu, Najim Dehak, Sanjeev Khudanpur
Using transcribed speech from nearby languages gives a further 20-30% relative reduction in character error rate.
1 code implementation • 31 Oct 2018 • Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King
In this work, we propose our replay attacks detection system - Attentive Filtering Network, which is composed of an attention-based filtering mechanism that enhances feature representations in both the frequency and time domains, and a ResNet-based classifier.
no code implementations • 17 Jul 2018 • Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, Najim Dehak, Sanjeev Khudanpur
In topic identification (topic ID) on real-world unstructured audio, an audio instance of variable topic shifts is first broken into sequential segments, and each segment is independently classified.
1 code implementation • 17 Jul 2018 • Suwon Shon, Najim Dehak, Douglas Reynolds, James Glass
The Multitarget Challenge aims to assess how well current speech technology is able to determine whether or not a recorded utterance was spoken by one of a large number of 'blacklisted' speakers.
Audio and Speech Processing Sound
no code implementations • 2 Jul 2018 • Piotr Żelasko, Piotr Szymański, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel, Najim Dehak
The models are trained on the Fisher corpus which includes punctuation annotation.
no code implementations • 23 Feb 2018 • Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak, Sanjeev Khudanpur
Automatic speech recognition (ASR) systems often need to be developed for extremely low-resource languages to serve end-uses such as audio content categorization and search.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 5 Feb 2017 • Chunxi Liu, Jinyi Yang, Ming Sun, Santosh Kesiraju, Alena Rott, Lucas Ondel, Pegah Ghahremani, Najim Dehak, Lukas Burget, Sanjeev Khudanpur
Acoustic unit discovery (AUD) is a process of automatically identifying a categorical acoustic unit inventory from speech and producing corresponding acoustic unit tokenizations.
1 code implementation • 23 Sep 2015 • Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals
We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.
no code implementations • 3 Apr 2015 • Fred Richardson, Douglas Reynolds, Najim Dehak
Learned feature representations and sub-phoneme posteriors from Deep Neural Networks (DNNs) have been used separately to produce significant performance gains for speaker and language recognition tasks.