Speaker verification is the verifying the identity of a person from characteristics of the voice.
( Image credit: Contrastive-Predictive-Coding-PyTorch )
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function.
In this work we present Ludwig, a flexible, extensible and easy to use toolbox which allows users to train deep learning models and use them for obtaining predictions without writing code.
IMAGE CAPTIONING IMAGE CLASSIFICATION LANGUAGE MODELLING MACHINE TRANSLATION MULTI-LABEL CLASSIFICATION MULTI-TASK LEARNING NAMED ENTITY RECOGNITION NATURAL LANGUAGE UNDERSTANDING ONE-SHOT LEARNING SENTIMENT ANALYSIS SPEAKER VERIFICATION TEXT CLASSIFICATION TIME SERIES FORECASTING VISUAL QUESTION ANSWERING
We propose the use of a coupled 3D Convolutional Neural Network (3D-CNN) architecture that can map both modalities into a representation space to evaluate the correspondence of audio-visual streams using the learned multimodal features.
In this paper we present DELTA, a deep learning based language technology platform.
Ranked #3 on Text Classification on Yahoo! Answers
Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants.
In our paper, we propose an adaptive feature learning by utilizing the 3D-CNNs for direct speaker model creation in which, for both development and enrollment phases, an identical number of spoken utterances per speaker is fed to the network for representing the speakers' utterances and creation of the speaker model.
To explore this issue, we proposed to employ Mockingjay, a self-supervised learning based model, to protect anti-spoofing models against adversarial attacks in the black-box scenario.
In this paper we present a data-driven, integrated approach to speaker verification, which maps a test utterance and a few reference utterances directly to a single score for verification and jointly optimizes the system's components using the same evaluation protocol and metric as at test time.
This thesis describes our ongoing work on Contrastive Predictive Coding (CPC) features for speaker verification.