Search Results for author: Jesús Villalba

Found 26 papers, 6 papers with code

Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification

no code implementations • 29 Feb 2024 • Sonal Joshi, Thomas Thebaud, Jesús Villalba, Najim Dehak

In this paper, we propose a method to detect the presence of adversarial examples, i. e., a binary classifier distinguishing between benign and adversarial examples.

Adversarial Attack Classification +1

Paper
Add Code

Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning

no code implementations • 8 Sep 2023 • Saurabhchand Bhati, Jesús Villalba, Laureano Moro-Velazquez, Thomas Thebaud, Najim Dehak

Cascaded SpeechCLIP attempted to generate localized word-level information and utilize both the pretrained image and text encoders.

audio-visual learning Quantization +1

Paper
Add Code

Regularizing Contrastive Predictive Coding for Speech Applications

no code implementations • 12 Apr 2023 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

These representations significantly reduce the amount of labeled data needed for downstream task performance, such as automatic speech recognition.

Acoustic Unit Discovery Automatic Speech Recognition +3

Paper
Add Code

Self-FiLM: Conditioning GANs with self-supervised representations for bandwidth extension based speaker recognition

no code implementations • 7 Mar 2023 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak

Speech super-resolution/Bandwidth Extension (BWE) can improve downstream tasks like Automatic Speaker Verification (ASV).

Bandwidth Extension Speaker Recognition +3

Paper
Add Code

Time-domain speech super-resolution with GAN based modeling for telephony speaker verification

no code implementations • 4 Sep 2022 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Piotr Żelasko, Najim Dehak

We show that our bandwidth extension leads to phenomena such as a shift of telephone (test) embeddings towards wideband (train) signals, a negative correlation of perceptual quality with downstream performance, and condition-independent score calibration.

Bandwidth Extension Data Augmentation +3

Paper
Add Code

Non-Contrastive Self-Supervised Learning of Utterance-Level Speech Representations

1 code implementation • 10 Aug 2022 • Jaejin Cho, Raghavendra Pappagari, Piotr Żelasko, Laureano Moro-Velazquez, Jesús Villalba, Najim Dehak

This paper applies a non-contrastive self-supervised learning method on an unlabeled speech corpus to learn utterance-level embeddings.

Emotion Recognition Self-Supervised Learning +1

Paper
Code

Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

no code implementations • 30 Mar 2022 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Najim Dehak

Then, we propose a two-stage learning solution where we use a pre-trained domain adaptation system for pre-processing in bandwidth extension training.

Bandwidth Extension Domain Adaptation +1

Paper
Add Code

Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding

no code implementations • 5 Oct 2021 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework to model the signal structure at a higher level, e. g., phone level.

Boundary Detection Representation Learning +1

Paper
Add Code

Beyond Isolated Utterances: Conversational Emotion Recognition

no code implementations • 13 Sep 2021 • Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Laureano Moro-Velazquez, Najim Dehak

While most of the current approaches focus on inferring emotion from isolated utterances, we argue that this is not sufficient to achieve conversational emotion recognition (CER) which deals with recognizing emotions in conversations.

Speech Emotion Recognition

Paper
Add Code

Representation Learning to Classify and Detect Adversarial Attacks against Speaker and Speech Recognition Systems

no code implementations • 9 Jul 2021 • Jesús Villalba, Sonal Joshi, Piotr Żelasko, Najim Dehak

Also, representations trained to classify attacks against speaker identification can be used also to classify attacks against speaker verification and speech recognition.

Representation Learning Speaker Identification +4

Paper
Add Code

Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation

no code implementations • 3 Jun 2021 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

We overcome this limitation with a segmental contrastive predictive coding (SCPC) framework that can model the signal structure at a higher level e. g. at the phoneme level.

Paper
Add Code

Deep Feature CycleGANs: Speaker Identity Preserving Non-parallel Microphone-Telephone Domain Adaptation for Speaker Verification

no code implementations • 3 Apr 2021 • Saurabh Kataria, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak

We investigate it for adapt microphone speech to the telephone domain.

Domain Adaptation Speaker Verification +1

Paper
Add Code

Study of Pre-processing Defenses against Adversarial Attacks on State-of-the-art Speaker Recognition Systems

no code implementations • 22 Jan 2021 • Sonal Joshi, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak

Such attacks pose severe security risks, making it vital to deep-dive and understand how much the state-of-the-art SR systems are vulnerable to these attacks.

Speaker Recognition

Paper
Add Code

Focus on the present: a regularization method for the ASR source-target attention layer

no code implementations • 2 Nov 2020 • Nanxin Chen, Piotr Żelasko, Jesús Villalba, Najim Dehak

This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training.

Decoder speech-recognition +1

Paper
Add Code

CopyPaste: An Augmentation Method for Speech Emotion Recognition

no code implementations • 27 Oct 2020 • Raghavendra Pappagari, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

Data augmentation is a widely used strategy for training robust machine learning models.

Data Augmentation Speaker Recognition +2

Paper
Add Code

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

no code implementations • 26 Jul 2020 • Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Najim Dehak

We perform segmentation based on the assumption that the frame feature vectors are more similar within a segment than across the segments.

Segmentation

Paper
Add Code

Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition

no code implementations • 10 Nov 2019 • Nanxin Chen, Shinji Watanabe, Jesús Villalba, Najim Dehak

In this paper, we study two different non-autoregressive transformer structure for automatic speech recognition (ASR): A-CMLM and A-FMLM.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-GANs

1 code implementation • 25 Oct 2019 • Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Najim Dehak

We experiment with two adaptation tasks: microphone to telephone and a novel reverberant to clean adaptation with the end goal of improving speaker recognition performance.

Audio and Speech Processing Sound

Paper
Code

Feature Enhancement with Deep Feature Losses for Speaker Verification

1 code implementation • 25 Oct 2019 • Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, Nanxin Chen, Paola García, Najim Dehak

On BabyTrain corpus, we observe relative gains of 10. 38% and 12. 40% in minDCF and EER respectively.

Denoising Speaker Verification +1

Paper
Code

Unsupervised Feature Enhancement for speaker verification

1 code implementation • 25 Oct 2019 • Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Paola García-Perera, Najim Dehak

The approach yielded significant improvements on both real and simulated sets when data augmentation was not used in speaker verification pipeline or augmentation was used only during x-vector training.

Audio and Speech Processing Sound

Paper
Code

Hierarchical Transformers for Long Document Classification

3 code implementations • 23 Oct 2019 • Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Yishay Carmiel, Najim Dehak

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm.

Classification Document Classification +3

Paper
Code

ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks

1 code implementation • 1 Apr 2019 • Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak

We present JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT).

Feature Engineering Voice Conversion

Paper
Code

Bayesian SPLDA

no code implementations • 20 Nov 2015 • Jesús Villalba

This can be used to adapt SPLDA from one database to another with few development data or to implement the fully Bayesian recipe.

Paper
Add Code

Variational Bayes Factor Analysis for i-Vector Extraction

no code implementations • 20 Nov 2015 • Jesús Villalba

In this document we are going to derive the equations needed to implement a Variational Bayes i-vector extractor.

Paper
Add Code

Unsupervised Adaptation of SPLDA

no code implementations • 20 Nov 2015 • Jesús Villalba

We describe a generative model that produces both sets of data where the unknown labels are modeled like latent variables.

speaker-diarization Speaker Diarization +1

Paper
Add Code

PLDA with Two Sources of Inter-session Variability

no code implementations • 20 Nov 2015 • Jesús Villalba

This model was applied in the paper "Handling Recordings Acquired Simultaneously over Multiple Channels with PLDA" published at Interspeech 2013.

Speaker Recognition Vocal Bursts Valence Prediction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.