Speaker Recognition

90 papers with code • 1 benchmarks • 6 datasets

Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.

Source: Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Libraries

Use these libraries to find Speaker Recognition models and implementations

Most implemented papers

Attention-Based Models for Text-Dependent Speaker Verification

liyongze/lstm_speaker_verification 28 Oct 2017

Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence.

VoxCeleb2: Deep Speaker Recognition

a-nagrani/VGGVox 14 Jun 2018

The objective of this paper is speaker recognition under noisy and unconstrained conditions.

Speech and Speaker Recognition from Raw Waveform with SincNet

mravanelli/SincNet 13 Dec 2018

Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones.

Personal VAD: Speaker-Conditioned Voice Activity Detection

pirxus/personalVAD 12 Aug 2019

In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.

Filterbank design for end-to-end speech separation

mpariente/AsSteroid 23 Oct 2019

Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions.

CN-CELEB: a challenging Chinese speaker recognition dataset

zhaoyi2/xvector-cnceleb 31 Oct 2019

These datasets tend to deliver over optimistic performance and do not meet the request of research on speaker recognition in unconstrained conditions.

Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models

Edresson/Speech2Phone 25 Feb 2020

We compare the three best architectures trained using our method to select the best one, which is the one with a shallow architecture.

Crossed-Time Delay Neural Network for Speaker Recognition

chenllliang/ctdnn 31 May 2020

Time Delay Neural Network (TDNN) is a well-performing structure for DNN-based speaker recognition systems.

Speaker anonymisation using the McAdams coefficient

josepatino/Voice-Privacy-Challenge-2020 2 Nov 2020

Anonymisation has the goal of manipulating speech signals in order to degrade the reliability of automatic approaches to speaker recognition, while preserving other aspects of speech, such as those relating to intelligibility and naturalness.

Pushing the limits of raw waveform speaker recognition

clovaai/voxceleb_trainer 16 Mar 2022

Our best model achieves an equal error rate of 0. 89%, which is competitive with the state-of-the-art models based on handcrafted features, and outperforms the best model based on raw waveform inputs by a large margin.