no code implementations • 3 May 2023 • Iván López-Espejo, Santi Prieto, Alfonso Ortega, Eduardo Lleida
Despite the maturity of modern speaker verification technology, its performance still significantly degrades when facing non-neutrally-phonated (e. g., shouted and whispered) speech.
no code implementations • 6 Nov 2021 • Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida
This paper explores three novel approaches to improve the performance of speaker verification (SV) systems based on deep neural networks (DNN) using Multi-head Self-Attention (MSA) mechanisms and memory layers.
no code implementations • 27 Oct 2021 • Pablo Gimeno, Victoria Mingote, Alfonso Ortega, Antonio Miguel, Eduardo Lleida
Area under the ROC curve (AUC) optimisation techniques developed for neural networks have recently demonstrated their capabilities in different audio and speech related tasks.
no code implementations • 6 Aug 2020 • Santi Prieto, Alfonso Ortega, Iván López-Espejo, Eduardo Lleida
These compensation techniques are borrowed from the area of robustness for automatic speech recognition and, in this work, we apply them to compensate the mismatch between shouted and normal conditions in speaker verification.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 31 Jan 2019 • Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida
This paper explores two techniques to improve the performance of text-dependent speaker verification systems based on deep neural networks.
no code implementations • 27 Dec 2018 • Antonio Miguel, Jorge Llombart, Alfonso Ortega, Eduardo Lleida
As in Joint Factor Analysis, the model uses tied hidden variables to model speaker and session variability and a MAP adaptation of some of the parameters of the model.
no code implementations • 22 Dec 2018 • Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida
Moreover, we can apply a convolutional neural network as front-end, and thanks to the alignment process being differentiable, we can train the whole network to produce a supervector for each utterance which will be discriminative with respect to the speaker and the phrase simultaneously.