Search Results for author: Jesus Villalba

Found 9 papers, 3 papers with code

DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model

1 code implementation18 Jun 2023 Helin Wang, Thomas Thebaud, Jesus Villalba, Myra Sydnor, Becky Lammers, Najim Dehak, Laureano Moro-Velazquez

We present a novel typical-to-atypical voice conversion approach (DuTa-VC), which (i) can be trained with nonparallel data (ii) first introduces diffusion probabilistic model (iii) preserves the target speaker identity (iv) is aware of the phoneme duration of the target speaker.

Data Augmentation speech-recognition +2

Stabilized training of joint energy-based models and their practical applications

no code implementations7 Mar 2023 Martin Sustek, Samik Sadhu, Lukas Burget, Hynek Hermansky, Jesus Villalba, Laureano Moro-Velazquez, Najim Dehak

The JEM training relies on "positive examples" (i. e. examples from the training data set) as well as on "negative examples", which are samples from the modeled distribution $p(x)$ generated by means of Stochastic Gradient Langevin Dynamics (SGLD).

AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification

no code implementations8 Apr 2022 Sonal Joshi, Saurabh Kataria, Jesus Villalba, Najim Dehak

Building on our previous work that used representation learning to classify and detect adversarial attacks, we propose an improvement to it using AdvEst, a method to estimate adversarial perturbation.

Representation Learning Speaker Identification

The JHU submission to VoxSRC-21: Track 3

no code implementations28 Sep 2021 Jejin Cho, Jesus Villalba, Najim Dehak

This technical report describes Johns Hopkins University speaker recognition system submitted to Voxceleb Speaker Recognition Challenge 2021 Track 3: Self-supervised speaker verification (closed).

Clustering Contrastive Learning +2

Adversarial Attacks and Defenses for Speech Recognition Systems

no code implementations31 Mar 2021 Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

We investigate two threat models: a denial-of-service scenario where fast gradient-sign method (FGSM) or weak projected gradient descent (PGD) attacks are used to degrade the model's word error rate (WER); and a targeted scenario where a more potent imperceptible attack forces the system to recognize a specific phrase.

Adversarial Robustness Automatic Speech Recognition +2

Learning Speaker Embedding from Text-to-Speech

1 code implementation21 Oct 2020 Jaejin Cho, Piotr Zelasko, Jesus Villalba, Shinji Watanabe, Najim Dehak

TTS with speaker classification loss improved EER by 0. 28\% and 0. 73\% absolutely from a model using only speaker classification loss in LibriTTS and Voxceleb1 respectively.

Classification General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.