no code implementations • 7 Mar 2023 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Thomas Thebaud, Najim Dehak
Speech super-resolution/Bandwidth Extension (BWE) can improve downstream tasks like Automatic Speaker Verification (ASV).
no code implementations • 4 Sep 2022 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Piotr Żelasko, Najim Dehak
We show that our bandwidth extension leads to phenomena such as a shift of telephone (test) embeddings towards wideband (train) signals, a negative correlation of perceptual quality with downstream performance, and condition-independent score calibration.
no code implementations • 8 Apr 2022 • Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak
We propose three defenses--denoiser pre-processor, adversarially fine-tuning ASR model, and adversarially fine-tuning joint model of ASR and denoiser.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 8 Apr 2022 • Sonal Joshi, Saurabh Kataria, Jesus Villalba, Najim Dehak
Building on our previous work that used representation learning to classify and detect adversarial attacks, we propose an improvement to it using AdvEst, a method to estimate adversarial perturbation.
no code implementations • 30 Mar 2022 • Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Najim Dehak
Then, we propose a two-stage learning solution where we use a pre-trained domain adaptation system for pre-processing in bandwidth extension training.
no code implementations • 3 Apr 2021 • Saurabh Kataria, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak
We investigate it for adapt microphone speech to the telephone domain.
no code implementations • 23 Oct 2020 • Saurabh Kataria, Shi-Xiong Zhang, Dong Yu
We find the improvements from speaker-dependent directional features more consistent in multi-talker conditions than clean.
1 code implementation • 2 Dec 2019 • Paola Garcia, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim Dehak
This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios.
Audio and Speech Processing Sound
1 code implementation • 25 Oct 2019 • Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Najim Dehak
We experiment with two adaptation tasks: microphone to telephone and a novel reverberant to clean adaptation with the end goal of improving speaker recognition performance.
Audio and Speech Processing Sound
1 code implementation • 25 Oct 2019 • Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, Nanxin Chen, Paola García, Najim Dehak
On BabyTrain corpus, we observe relative gains of 10. 38% and 12. 40% in minDCF and EER respectively.
1 code implementation • 25 Oct 2019 • Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Paola García-Perera, Najim Dehak
The approach yielded significant improvements on both real and simulated sets when data augmentation was not used in speaker verification pipeline or augmentation was used only during x-vector training.
Audio and Speech Processing Sound
no code implementations • 6 Jul 2017 • Fenglong Ma, Radha Chitta, Saurabh Kataria, Jing Zhou, Palghat Ramesh, Tong Sun, Jing Gao
Question answering is an important and difficult task in the natural language processing domain, because many basic natural language processing tasks can be cast into a question answering task.
no code implementations • 14 Dec 2016 • Clément Gaultier, Saurabh Kataria, Antoine Deleforge
This paper introduces a new paradigm for sound source lo-calization referred to as virtual acoustic space traveling (VAST) and presents a first dataset designed for this purpose.
no code implementations • 24 Mar 2016 • Saurabh Kataria
We consider the problem of learning distributed representations for tags from their associated content for the task of tag recommendation.
no code implementations • 25 Apr 2014 • Arvind Agarwal, Saurabh Kataria
Our method learns multiple models, one model for each label sequence.