Search Results for author: Najim Dehak

Found 54 papers, 18 papers with code

Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification

no code implementations • 29 Feb 2024 • Sonal Joshi, Thomas Thebaud, Jesús Villalba, Najim Dehak

In this paper, we propose a method to detect the presence of adversarial examples, i. e., a binary classifier distinguishing between benign and adversarial examples.

Adversarial Attack Classification +1

Paper
Add Code

Time Scale Network: A Shallow Neural Network For Time Series Data

no code implementations • 10 Nov 2023 • Trevor Meyer, Camden Shultz, Najim Dehak, Laureano Moro-Velazquez, Pedro Irazoqui

The network simultaneously learns features at many time scales for sequence classification with significantly reduced parameters and operations.

EEG Seizure prediction +3

Paper
Add Code

DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction

1 code implementation • 6 Oct 2023 • Jiarui Hai, Helin Wang, Dongchao Yang, Karan Thakkar, Najim Dehak, Mounya Elhilali

Common target sound extraction (TSE) approaches primarily relied on discriminative approaches in order to separate the target sound while minimizing interference from the unwanted sources, with varying success in separating the target from the background.

Target Sound Extraction

Paper
Code

Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning

no code implementations • 8 Sep 2023 • Saurabhchand Bhati, Jesús Villalba, Laureano Moro-Velazquez, Thomas Thebaud, Najim Dehak

Cascaded SpeechCLIP attempted to generate localized word-level information and utilize both the pretrained image and text encoders.

audio-visual learning Quantization +1

Paper
Add Code

DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model

1 code implementation • 18 Jun 2023 • Helin Wang, Thomas Thebaud, Jesus Villalba, Myra Sydnor, Becky Lammers, Najim Dehak, Laureano Moro-Velazquez

We present a novel typical-to-atypical voice conversion approach (DuTa-VC), which (i) can be trained with nonparallel data (ii) first introduces diffusion probabilistic model (iii) preserves the target speaker identity (iv) is aware of the phoneme duration of the target speaker.

Data Augmentation speech-recognition +2

Paper
Code

Regularizing Contrastive Predictive Coding for Speech Applications

no code implementations • 12 Apr 2023 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

These representations significantly reduce the amount of labeled data needed for downstream task performance, such as automatic speech recognition.

Acoustic Unit Discovery Automatic Speech Recognition +3

Paper
Add Code

Stabilized training of joint energy-based models and their practical applications

no code implementations • 7 Mar 2023 • Martin Sustek, Samik Sadhu, Lukas Burget, Hynek Hermansky, Jesus Villalba, Laureano Moro-Velazquez, Najim Dehak

The JEM training relies on "positive examples" (i. e. examples from the training data set) as well as on "negative examples", which are samples from the modeled distribution $p(x)$ generated by means of Stochastic Gradient Langevin Dynamics (SGLD).

Paper
Add Code

Self-FiLM: Conditioning GANs with self-supervised representations for bandwidth extension based speaker recognition

no code implementations • 7 Mar 2023 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak

Speech super-resolution/Bandwidth Extension (BWE) can improve downstream tasks like Automatic Speaker Verification (ASV).

Bandwidth Extension Speaker Recognition +3

Paper
Add Code

Time-domain speech super-resolution with GAN based modeling for telephony speaker verification

no code implementations • 4 Sep 2022 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Piotr Żelasko, Najim Dehak

We show that our bandwidth extension leads to phenomena such as a shift of telephone (test) embeddings towards wideband (train) signals, a negative correlation of perceptual quality with downstream performance, and condition-independent score calibration.

Bandwidth Extension Data Augmentation +3

Paper
Add Code

Non-Contrastive Self-Supervised Learning of Utterance-Level Speech Representations

1 code implementation • 10 Aug 2022 • Jaejin Cho, Raghavendra Pappagari, Piotr Żelasko, Laureano Moro-Velazquez, Jesús Villalba, Najim Dehak

This paper applies a non-contrastive self-supervised learning method on an unlabeled speech corpus to learn utterance-level embeddings.

Emotion Recognition Self-Supervised Learning +1

Paper
Code

Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech

no code implementations • 10 Aug 2022 • Jaejin Cho, Jes'us Villalba, Laureano Moro-Velazquez, Najim Dehak

In recent studies, self-supervised pre-trained models tend to outperform supervised pre-trained models in transfer learning.

Alzheimer's Disease Detection Self-Supervised Learning +3

Paper
Add Code

AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification

no code implementations • 8 Apr 2022 • Sonal Joshi, Saurabh Kataria, Jesus Villalba, Najim Dehak

Building on our previous work that used representation learning to classify and detect adversarial attacks, we propose an improvement to it using AdvEst, a method to estimate adversarial perturbation.

Representation Learning Speaker Identification

Paper
Add Code

Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser

no code implementations • 8 Apr 2022 • Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak

We propose three defenses--denoiser pre-processor, adversarially fine-tuning ASR model, and adversarially fine-tuning joint model of ASR and denoiser.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

no code implementations • 30 Mar 2022 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Najim Dehak

Then, we propose a two-stage learning solution where we use a pre-trained domain adaptation system for pre-processing in bandwidth extension training.

Bandwidth Extension Domain Adaptation +1

Paper
Add Code

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

1 code implementation • 26 Jan 2022 • Piotr Żelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak

In this paper, we 1) investigate the influence of different factors (i. e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition

no code implementations • 7 Jan 2022 • Amir Hussein, Shammur Absar Chowdhury, Ahmed Abdelali, Najim Dehak, Ahmed Ali, Sanjeev Khudanpur

The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language.

Language Modelling speech-recognition +5

Paper
Add Code

Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding

no code implementations • 5 Oct 2021 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework to model the signal structure at a higher level, e. g., phone level.

Boundary Detection Representation Learning +1

Paper
Add Code

The JHU submission to VoxSRC-21: Track 3

no code implementations • 28 Sep 2021 • Jejin Cho, Jesus Villalba, Najim Dehak

This technical report describes Johns Hopkins University speaker recognition system submitted to Voxceleb Speaker Recognition Challenge 2021 Track 3: Self-supervised speaker verification (closed).

Clustering Contrastive Learning +2

Paper
Add Code

Beyond Isolated Utterances: Conversational Emotion Recognition

no code implementations • 13 Sep 2021 • Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Laureano Moro-Velazquez, Najim Dehak

While most of the current approaches focus on inferring emotion from isolated utterances, we argue that this is not sufficient to achieve conversational emotion recognition (CER) which deals with recognizing emotions in conversations.

Speech Emotion Recognition

Paper
Add Code

Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios

no code implementations • 13 Sep 2021 • Raghavendra Pappagari, Piotr Żelasko, Agnieszka Mikołajczyk, Piotr Pęzik, Najim Dehak

Further, we show that by training the model in the written text domain and then transfer learning to conversations, we can achieve reasonable performance with less data.

Transfer Learning

Paper
Add Code

Representation Learning to Classify and Detect Adversarial Attacks against Speaker and Speech Recognition Systems

no code implementations • 9 Jul 2021 • Jesús Villalba, Sonal Joshi, Piotr Żelasko, Najim Dehak

Also, representations trained to classify attacks against speaker identification can be used also to classify attacks against speaker verification and speech recognition.

Representation Learning Speaker Identification +4

Paper
Add Code

What Helps Transformers Recognize Conversational Structure? Importance of Context, Punctuation, and Labels in Dialog Act Recognition

1 code implementation • 5 Jul 2021 • Piotr Żelasko, Raghavendra Pappagari, Najim Dehak

Dialog acts can be interpreted as the atomic units of a conversation, more fine-grained than utterances, characterized by a specific communicative function.

Segmentation Specificity +1

Paper
Code

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

3 code implementations • 17 Jun 2021 • Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan

The model takes an input phoneme sequence, and through an iterative refinement process, generates an audio waveform.

Speech Synthesis Text-To-Speech Synthesis

110

Paper
Code

Balanced End-to-End Monolingual pre-training for Low-Resourced Indic Languages Code-Switching Speech Recognition

no code implementations • 10 Jun 2021 • Amir Hussein, Shammur Chowdhury, Najim Dehak, Ahmed Ali

In this paper, we exploit the transfer learning approach to design End-to-End (E2E) CS ASR systems for the two low-resourced language pairs using different monolingual speech data and a small set of noisy CS data.

Language Modelling speech-recognition +3

Paper
Add Code

Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation

no code implementations • 3 Jun 2021 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework that can model the signal structure at a higher level e. g. at the phoneme level.

Paper
Add Code

Deep Feature CycleGANs: Speaker Identity Preserving Non-parallel Microphone-Telephone Domain Adaptation for Speaker Verification

no code implementations • 3 Apr 2021 • Saurabh Kataria, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak

We investigate it for adapt microphone speech to the telephone domain.

Domain Adaptation Speaker Verification +1

Paper
Add Code

Adversarial Attacks and Defenses for Speech Recognition Systems

no code implementations • 31 Mar 2021 • Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

We investigate two threat models: a denial-of-service scenario where fast gradient-sign method (FGSM) or weak projected gradient descent (PGD) attacks are used to degrade the model's word error rate (WER); and a targeted scenario where a more potent imperceptible attack forces the system to recognize a specific phrase.

Adversarial Robustness Automatic Speech Recognition +2

Paper
Add Code

Study of Pre-processing Defenses against Adversarial Attacks on State-of-the-art Speaker Recognition Systems

no code implementations • 22 Jan 2021 • Sonal Joshi, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak

Such attacks pose severe security risks, making it vital to deep-dive and understand how much the state-of-the-art SR systems are vulnerable to these attacks.

Speaker Recognition

Paper
Add Code

Focus on the present: a regularization method for the ASR source-target attention layer

no code implementations • 2 Nov 2020 • Nanxin Chen, Piotr Żelasko, Jesús Villalba, Najim Dehak

This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training.

speech-recognition Speech Recognition

Paper
Add Code

CopyPaste: An Augmentation Method for Speech Emotion Recognition

no code implementations • 27 Oct 2020 • Raghavendra Pappagari, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

Data augmentation is a widely used strategy for training robust machine learning models.

Data Augmentation Speaker Recognition +2

Paper
Add Code

How Phonotactics Affect Multilingual and Zero-shot ASR Performance

1 code implementation • 22 Oct 2020 • Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak

Furthermore, we find that a multilingual LM hurts a multilingual ASR system's performance, and retaining only the target language's phonotactic data in LM training is preferable.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Learning Speaker Embedding from Text-to-Speech

1 code implementation • 21 Oct 2020 • Jaejin Cho, Piotr Zelasko, Jesus Villalba, Shinji Watanabe, Najim Dehak

TTS with speaker classification loss improved EER by 0. 28\% and 0. 73\% absolutely from a model using only speaker classification loss in LibriTTS and Voxceleb1 respectively.

Classification General Classification +2

Paper
Code

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

no code implementations • 26 Jul 2020 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Najim Dehak

We perform segmentation based on the assumption that the frame feature vectors are more similar within a segment than across the segments.

Segmentation

Paper
Add Code

That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

no code implementations • 16 May 2020 • Piotr Żelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak

Only a handful of the world's languages are abundant with the resources that enable practical applications of speech processing technologies.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

no code implementations • 13 Apr 2020 • Łukasz Augustyniak, Piotr Szymanski, Mikołaj Morzy, Piotr .Zelasko, Adrian Szymczak, Jan Mizgajski, Yishay Carmiel, Najim Dehak

Automatic Speech Recognition (ASR) systems introduce word errors, which often confuse punctuation prediction models, turning punctuation restoration into a challenging task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

x-vectors meet emotions: A study on dependencies between emotion and speaker recognition

no code implementations • 12 Feb 2020 • Raghavendra Pappagari, Tianzi Wang, Jesus Villalba, Nanxin Chen, Najim Dehak

Then, we show the effect of emotion on speaker recognition.

Emotion Classification Emotion Recognition +3

Paper
Add Code

Speaker detection in the wild: Lessons learned from JSALT 2019

1 code implementation • 2 Dec 2019 • Paola Garcia, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim Dehak

This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios.

Audio and Speech Processing Sound

Paper
Code

Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition

no code implementations • 10 Nov 2019 • Nanxin Chen, Shinji Watanabe, Jesús Villalba, Najim Dehak

In this paper, we study two different non-autoregressive transformer structure for automatic speech recognition (ASR): A-CMLM and A-FMLM.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-GANs

1 code implementation • 25 Oct 2019 • Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Najim Dehak

We experiment with two adaptation tasks: microphone to telephone and a novel reverberant to clean adaptation with the end goal of improving speaker recognition performance.

Audio and Speech Processing Sound

Paper
Code

Feature Enhancement with Deep Feature Losses for Speaker Verification

1 code implementation • 25 Oct 2019 • Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, Nanxin Chen, Paola García, Najim Dehak

On BabyTrain corpus, we observe relative gains of 10. 38% and 12. 40% in minDCF and EER respectively.

Denoising Speaker Verification +1

Paper
Code

Unsupervised Feature Enhancement for speaker verification

1 code implementation • 25 Oct 2019 • Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Paola García-Perera, Najim Dehak

The approach yielded significant improvements on both real and simulated sets when data augmentation was not used in speaker verification pipeline or augmentation was used only during x-vector training.

Audio and Speech Processing Sound

Paper
Code

Hierarchical Transformers for Long Document Classification

3 code implementations • 23 Oct 2019 • Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Yishay Carmiel, Najim Dehak

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm.

Classification Document Classification +3

Paper
Code

rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method

3 code implementations • 9 Jun 2019 • Zheng-Hua Tan, Achintya kr. Sarkar, Najim Dehak

In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice activity.

Action Detection Activity Detection +3

120

Paper
Code

Speaker Sincerity Detection based on Covariance Feature Vectors and Ensemble Methods

no code implementations • 26 Apr 2019 • Mohammed Senoussaoui, Patrick Cardinal, Najim Dehak, Alessandro Lameiras Koerich

Automatic measuring of speaker sincerity degree is a novel research problem in computational paralinguistics.

Paper
Add Code

ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks

1 code implementation • 1 Apr 2019 • Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak

We present JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT).

Feature Engineering Voice Conversion

Paper
Code

Pretraining by Backtranslation for End-to-end ASR in Low-Resource Settings

no code implementations • 10 Dec 2018 • Matthew Wiesner, Adithya Renduchintala, Shinji Watanabe, Chunxi Liu, Najim Dehak, Sanjeev Khudanpur

Using transcribed speech from nearby languages gives a further 20-30% relative reduction in character error rate.

Data Augmentation

Paper
Add Code

Attentive Filtering Networks for Audio Replay Attack Detection

1 code implementation • 31 Oct 2018 • Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King

In this work, we propose our replay attacks detection system - Attentive Filtering Network, which is composed of an attention-based filtering mechanism that enhances feature representations in both the frequency and time domains, and a ResNet-based classifier.

Speaker Verification

Paper
Code

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System

1 code implementation • 17 Jul 2018 • Suwon Shon, Najim Dehak, Douglas Reynolds, James Glass

The Multitarget Challenge aims to assess how well current speech technology is able to determine whether or not a recorded utterance was spoken by one of a large number of 'blacklisted' speakers.

Audio and Speech Processing Sound

Paper
Code

Low-Resource Contextual Topic Identification on Speech

no code implementations • 17 Jul 2018 • Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

In topic identification (topic ID) on real-world unstructured audio, an audio instance of variable topic shifts is first broken into sequential segments, and each segment is independently classified.

General Classification Topic Classification +1

Paper
Add Code

Punctuation Prediction Model for Conversational Speech

no code implementations • 2 Jul 2018 • Piotr Żelasko, Piotr Szymański, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel, Najim Dehak

The models are trained on the Fisher corpus which includes punctuation annotation.

Paper
Add Code

Automatic Speech Recognition and Topic Identification for Almost-Zero-Resource Languages

no code implementations • 23 Feb 2018 • Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak, Sanjeev Khudanpur

Automatic speech recognition (ASR) systems often need to be developed for extremely low-resource languages to serve end-uses such as audio content categorization and search.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

An Empirical Evaluation of Zero Resource Acoustic Unit Discovery

no code implementations • 5 Feb 2017 • Chunxi Liu, Jinyi Yang, Ming Sun, Santosh Kesiraju, Alena Rott, Lucas Ondel, Pegah Ghahremani, Najim Dehak, Lukas Burget, Sanjeev Khudanpur

Acoustic unit discovery (AUD) is a process of automatically identifying a categorical acoustic unit inventory from speech and producing corresponding acoustic unit tokenizations.

Acoustic Unit Discovery

Paper
Add Code

Automatic Dialect Detection in Arabic Broadcast Speech

1 code implementation • 23 Sep 2015 • Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals

We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.

Ranked #1 on Spoken language identification on Untranscribed mixed-speech dataset

Dialect Identification speech-recognition +2

Paper
Code

A Unified Deep Neural Network for Speaker and Language Recognition

no code implementations • 3 Apr 2015 • Fred Richardson, Douglas Reynolds, Najim Dehak

Learned feature representations and sub-phoneme posteriors from Deep Neural Networks (DNNs) have been used separately to produce significant performance gains for speaker and language recognition tasks.

Domain Adaptation Speaker Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.