Browse > Speech > Speaker Verification

Speaker Verification

13 papers with code · Speech

Speaker verification is the verifying the identity of a person from characteristics of the voice.

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

3D Convolutional Neural Networks for Cross Audio-Visual Matching Recognition

18 Jun 2017astorfi/lip-reading-deeplearning

Audio-visual recognition (AVR) has been considered as a solution for speech recognition tasks when the audio is corrupted, as well as a visual recognition method used for speaker verification in multi-speaker scenarios. We propose the use of a coupled 3D Convolutional Neural Network (3D-CNN) architecture that can map both modalities into a representation space to evaluate the correspondence of audio-visual streams using the learned multimodal features.

SPEAKER VERIFICATION SPEECH RECOGNITION

Text-Independent Speaker Verification Using 3D Convolutional Neural Networks

26 May 2017astorfi/3D-convolutional-speaker-recognition

In our paper, we propose an adaptive feature learning by utilizing the 3D-CNNs for direct speaker model creation in which, for both development and enrollment phases, an identical number of spoken utterances per speaker is fed to the network for representing the speakers' utterances and creation of the speaker model. This leads to simultaneously capturing the speaker-related information and building a more robust system to cope with within-speaker variation.

TEXT-INDEPENDENT SPEAKER VERIFICATION

Speaker Recognition from Raw Waveform with SincNet

29 Jul 2018mravanelli/SincNet

Deep learning is progressively gaining popularity as a viable alternative to i-vectors for speaker recognition. Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants.

SPEAKER IDENTIFICATION SPEAKER RECOGNITION SPEAKER VERIFICATION

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

NeurIPS 2017 wnhsu/FactorizedHierarchicalVAE

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables.

SPEAKER VERIFICATION

Generalized End-to-End Loss for Speaker Verification

28 Oct 2017HarryVolek/PyTorch_Speaker_Verification

In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function. Unlike TE2E, the GE2E loss function updates the network in a way that emphasizes examples that are difficult to verify at each step of the training process.

DOMAIN ADAPTATION SPEAKER VERIFICATION

End-to-End Text-Dependent Speaker Verification

27 Sep 2015JanhHyun/Speaker_Verification

In this paper we present a data-driven, integrated approach to speaker verification, which maps a test utterance and a few reference utterances directly to a single score for verification and jointly optimizes the system's components using the same evaluation protocol and metric as at test time. Such an approach will result in simple and efficient systems, requiring little domain-specific knowledge and making few model assumptions.

TEXT-DEPENDENT SPEAKER VERIFICATION

Scalable Factorized Hierarchical Variational Autoencoder Training

9 Apr 2018wnhsu/ScalableFHVAE

Deep generative models have achieved great success in unsupervised learning with the ability to capture complex nonlinear relationships between latent generating factors and observations. Among them, a factorized hierarchical variational autoencoder (FHVAE) is a variational inference-based model that formulates a hierarchical generative process for sequential data.

ROBUST SPEECH RECOGNITION SPEAKER VERIFICATION

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

NeurIPS 2018 Suhee05/Text-Independent-Speaker-Verification

We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training. Our system consists of three independently trained components: (1) a speaker encoder network, trained on a speaker verification task using an independent dataset of noisy speech from thousands of speakers without transcripts, to generate a fixed-dimensional embedding vector from seconds of reference speech from a target speaker; (2) a sequence-to-sequence synthesis network based on Tacotron 2, which generates a mel spectrogram from text, conditioned on the speaker embedding; (3) an auto-regressive WaveNet-based vocoder that converts the mel spectrogram into a sequence of time domain waveform samples.

SPEAKER VERIFICATION SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS TRANSFER LEARNING

Attentive Filtering Networks for Audio Replay Attack Detection

31 Oct 2018jefflai108/Attentive-Filtering-Network

An attacker may use a variety of techniques to fool an automatic speaker verification system into accepting them as a genuine user. In this work, we propose our replay attacks detection system - Attentive Filtering Network, which is composed of an attention-based filtering mechanism that enhances feature representations in both the frequency and time domains, and a ResNet-based classifier.

SPEAKER VERIFICATION

Multiobjective Optimization Training of PLDA for Speaker Verification

25 Aug 2018sanphiee/MOT-sGPLDA-SRE14

Most current state-of-the-art text-independent speaker verification systems take probabilistic linear discriminant analysis (PLDA) as their backend classifiers. The parameters of PLDA are often estimated by maximizing the objective function, which focuses on increasing the value of log-likelihood function, but ignoring the distinction between speakers.

MULTIOBJECTIVE OPTIMIZATION TEXT-INDEPENDENT SPEAKER VERIFICATION