Search Results for author: Najim Dehak

Found 55 papers, 18 papers with code

Noise-robust Speech Separation with Fast Generative Correction

no code implementations11 Jun 2024 Helin Wang, Jesus Villalba, Laureano Moro-Velazquez, Jiarui Hai, Thomas Thebaud, Najim Dehak

Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments.

Speech Separation

Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification

no code implementations29 Feb 2024 Sonal Joshi, Thomas Thebaud, Jesús Villalba, Najim Dehak

In this paper, we propose a method to detect the presence of adversarial examples, i. e., a binary classifier distinguishing between benign and adversarial examples.

Adversarial Attack Classification +1

Time Scale Network: A Shallow Neural Network For Time Series Data

no code implementations10 Nov 2023 Trevor Meyer, Camden Shultz, Najim Dehak, Laureano Moro-Velazquez, Pedro Irazoqui

The network simultaneously learns features at many time scales for sequence classification with significantly reduced parameters and operations.

EEG Seizure prediction +3

DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction

1 code implementation6 Oct 2023 Jiarui Hai, Helin Wang, Dongchao Yang, Karan Thakkar, Najim Dehak, Mounya Elhilali

Common target sound extraction (TSE) approaches primarily relied on discriminative approaches in order to separate the target sound while minimizing interference from the unwanted sources, with varying success in separating the target from the background.

Target Sound Extraction

Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning

no code implementations8 Sep 2023 Saurabhchand Bhati, Jesús Villalba, Laureano Moro-Velazquez, Thomas Thebaud, Najim Dehak

Cascaded SpeechCLIP attempted to generate localized word-level information and utilize both the pretrained image and text encoders.

audio-visual learning Quantization +1

DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model

1 code implementation18 Jun 2023 Helin Wang, Thomas Thebaud, Jesus Villalba, Myra Sydnor, Becky Lammers, Najim Dehak, Laureano Moro-Velazquez

We present a novel typical-to-atypical voice conversion approach (DuTa-VC), which (i) can be trained with nonparallel data (ii) first introduces diffusion probabilistic model (iii) preserves the target speaker identity (iv) is aware of the phoneme duration of the target speaker.

Data Augmentation Decoder +3

Regularizing Contrastive Predictive Coding for Speech Applications

no code implementations12 Apr 2023 Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

These representations significantly reduce the amount of labeled data needed for downstream task performance, such as automatic speech recognition.

Acoustic Unit Discovery Automatic Speech Recognition +3

Stabilized training of joint energy-based models and their practical applications

no code implementations7 Mar 2023 Martin Sustek, Samik Sadhu, Lukas Burget, Hynek Hermansky, Jesus Villalba, Laureano Moro-Velazquez, Najim Dehak

The JEM training relies on "positive examples" (i. e. examples from the training data set) as well as on "negative examples", which are samples from the modeled distribution $p(x)$ generated by means of Stochastic Gradient Langevin Dynamics (SGLD).

Time-domain speech super-resolution with GAN based modeling for telephony speaker verification

no code implementations4 Sep 2022 Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Piotr Żelasko, Najim Dehak

We show that our bandwidth extension leads to phenomena such as a shift of telephone (test) embeddings towards wideband (train) signals, a negative correlation of perceptual quality with downstream performance, and condition-independent score calibration.

Bandwidth Extension Data Augmentation +3

AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification

no code implementations8 Apr 2022 Sonal Joshi, Saurabh Kataria, Jesus Villalba, Najim Dehak

Building on our previous work that used representation learning to classify and detect adversarial attacks, we propose an improvement to it using AdvEst, a method to estimate adversarial perturbation.

Representation Learning Speaker Identification

Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

no code implementations30 Mar 2022 Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Najim Dehak

Then, we propose a two-stage learning solution where we use a pre-trained domain adaptation system for pre-processing in bandwidth extension training.

Bandwidth Extension Domain Adaptation +1

Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition

1 code implementation26 Jan 2022 Piotr Żelasko, Siyuan Feng, Laureano Moro Velazquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak

In this paper, we 1) investigate the influence of different factors (i. e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language; 2) provide an analysis of which phones transfer well across languages and which do not in order to understand the limitations of and areas for further improvement for automatic phone inventory creation; and 3) present different methods to build a phone inventory of an unseen language in an unsupervised way.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition

no code implementations7 Jan 2022 Amir Hussein, Shammur Absar Chowdhury, Ahmed Abdelali, Najim Dehak, Ahmed Ali, Sanjeev Khudanpur

The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language.

Language Modelling speech-recognition +5

The JHU submission to VoxSRC-21: Track 3

no code implementations28 Sep 2021 Jejin Cho, Jesus Villalba, Najim Dehak

This technical report describes Johns Hopkins University speaker recognition system submitted to Voxceleb Speaker Recognition Challenge 2021 Track 3: Self-supervised speaker verification (closed).

Clustering Contrastive Learning +2

Beyond Isolated Utterances: Conversational Emotion Recognition

no code implementations13 Sep 2021 Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Laureano Moro-Velazquez, Najim Dehak

While most of the current approaches focus on inferring emotion from isolated utterances, we argue that this is not sufficient to achieve conversational emotion recognition (CER) which deals with recognizing emotions in conversations.

Speech Emotion Recognition

Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios

no code implementations13 Sep 2021 Raghavendra Pappagari, Piotr Żelasko, Agnieszka Mikołajczyk, Piotr Pęzik, Najim Dehak

Further, we show that by training the model in the written text domain and then transfer learning to conversations, we can achieve reasonable performance with less data.

Transfer Learning

Representation Learning to Classify and Detect Adversarial Attacks against Speaker and Speech Recognition Systems

no code implementations9 Jul 2021 Jesús Villalba, Sonal Joshi, Piotr Żelasko, Najim Dehak

Also, representations trained to classify attacks against speaker identification can be used also to classify attacks against speaker verification and speech recognition.

Representation Learning Speaker Identification +4

What Helps Transformers Recognize Conversational Structure? Importance of Context, Punctuation, and Labels in Dialog Act Recognition

1 code implementation5 Jul 2021 Piotr Żelasko, Raghavendra Pappagari, Najim Dehak

Dialog acts can be interpreted as the atomic units of a conversation, more fine-grained than utterances, characterized by a specific communicative function.

Segmentation Specificity +1

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

3 code implementations17 Jun 2021 Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan

The model takes an input phoneme sequence, and through an iterative refinement process, generates an audio waveform.

Speech Synthesis Text-To-Speech Synthesis

Balanced End-to-End Monolingual pre-training for Low-Resourced Indic Languages Code-Switching Speech Recognition

no code implementations10 Jun 2021 Amir Hussein, Shammur Chowdhury, Najim Dehak, Ahmed Ali

In this paper, we exploit the transfer learning approach to design End-to-End (E2E) CS ASR systems for the two low-resourced language pairs using different monolingual speech data and a small set of noisy CS data.

Language Modelling speech-recognition +3

Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation

no code implementations3 Jun 2021 Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework that can model the signal structure at a higher level e. g. at the phoneme level.

Adversarial Attacks and Defenses for Speech Recognition Systems

no code implementations31 Mar 2021 Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

We investigate two threat models: a denial-of-service scenario where fast gradient-sign method (FGSM) or weak projected gradient descent (PGD) attacks are used to degrade the model's word error rate (WER); and a targeted scenario where a more potent imperceptible attack forces the system to recognize a specific phrase.

Adversarial Robustness Automatic Speech Recognition +3

Study of Pre-processing Defenses against Adversarial Attacks on State-of-the-art Speaker Recognition Systems

no code implementations22 Jan 2021 Sonal Joshi, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak

Such attacks pose severe security risks, making it vital to deep-dive and understand how much the state-of-the-art SR systems are vulnerable to these attacks.

Speaker Recognition

Focus on the present: a regularization method for the ASR source-target attention layer

no code implementations2 Nov 2020 Nanxin Chen, Piotr Żelasko, Jesús Villalba, Najim Dehak

This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training.

Decoder speech-recognition +1

How Phonotactics Affect Multilingual and Zero-shot ASR Performance

1 code implementation22 Oct 2020 Siyuan Feng, Piotr Żelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak

Furthermore, we find that a multilingual LM hurts a multilingual ASR system's performance, and retaining only the target language's phonotactic data in LM training is preferable.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Learning Speaker Embedding from Text-to-Speech

1 code implementation21 Oct 2020 Jaejin Cho, Piotr Zelasko, Jesus Villalba, Shinji Watanabe, Najim Dehak

TTS with speaker classification loss improved EER by 0. 28\% and 0. 73\% absolutely from a model using only speaker classification loss in LibriTTS and Voxceleb1 respectively.

Classification Decoder +3

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

no code implementations26 Jul 2020 Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Najim Dehak

We perform segmentation based on the assumption that the frame feature vectors are more similar within a segment than across the segments.


Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-GANs

1 code implementation25 Oct 2019 Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Najim Dehak

We experiment with two adaptation tasks: microphone to telephone and a novel reverberant to clean adaptation with the end goal of improving speaker recognition performance.

Audio and Speech Processing Sound

Unsupervised Feature Enhancement for speaker verification

1 code implementation25 Oct 2019 Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Paola García-Perera, Najim Dehak

The approach yielded significant improvements on both real and simulated sets when data augmentation was not used in speaker verification pipeline or augmentation was used only during x-vector training.

Audio and Speech Processing Sound

Hierarchical Transformers for Long Document Classification

3 code implementations23 Oct 2019 Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Yishay Carmiel, Najim Dehak

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm.

Classification Document Classification +3

rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method

3 code implementations9 Jun 2019 Zheng-Hua Tan, Achintya kr. Sarkar, Najim Dehak

In the end, a posteriori SNR weighted energy difference is applied to the extended pitch segments of the denoised speech signal for detecting voice activity.

Action Detection Activity Detection +3

Speaker Sincerity Detection based on Covariance Feature Vectors and Ensemble Methods

no code implementations26 Apr 2019 Mohammed Senoussaoui, Patrick Cardinal, Najim Dehak, Alessandro Lameiras Koerich

Automatic measuring of speaker sincerity degree is a novel research problem in computational paralinguistics.

ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks

1 code implementation1 Apr 2019 Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak

We present JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT).

Feature Engineering Voice Conversion

Attentive Filtering Networks for Audio Replay Attack Detection

1 code implementation31 Oct 2018 Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King

In this work, we propose our replay attacks detection system - Attentive Filtering Network, which is composed of an attention-based filtering mechanism that enhances feature representations in both the frequency and time domains, and a ResNet-based classifier.

Speaker Verification

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System

1 code implementation17 Jul 2018 Suwon Shon, Najim Dehak, Douglas Reynolds, James Glass

The Multitarget Challenge aims to assess how well current speech technology is able to determine whether or not a recorded utterance was spoken by one of a large number of 'blacklisted' speakers.

Audio and Speech Processing Sound

Low-Resource Contextual Topic Identification on Speech

no code implementations17 Jul 2018 Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

In topic identification (topic ID) on real-world unstructured audio, an audio instance of variable topic shifts is first broken into sequential segments, and each segment is independently classified.

General Classification Topic Classification +1

Automatic Speech Recognition and Topic Identification for Almost-Zero-Resource Languages

no code implementations23 Feb 2018 Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak, Sanjeev Khudanpur

Automatic speech recognition (ASR) systems often need to be developed for extremely low-resource languages to serve end-uses such as audio content categorization and search.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

An Empirical Evaluation of Zero Resource Acoustic Unit Discovery

no code implementations5 Feb 2017 Chunxi Liu, Jinyi Yang, Ming Sun, Santosh Kesiraju, Alena Rott, Lucas Ondel, Pegah Ghahremani, Najim Dehak, Lukas Burget, Sanjeev Khudanpur

Acoustic unit discovery (AUD) is a process of automatically identifying a categorical acoustic unit inventory from speech and producing corresponding acoustic unit tokenizations.

Acoustic Unit Discovery

A Unified Deep Neural Network for Speaker and Language Recognition

no code implementations3 Apr 2015 Fred Richardson, Douglas Reynolds, Najim Dehak

Learned feature representations and sub-phoneme posteriors from Deep Neural Networks (DNNs) have been used separately to produce significant performance gains for speaker and language recognition tasks.

Domain Adaptation Speaker Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.