Speaker Recognition
90 papers with code • 1 benchmarks • 6 datasets
Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.
Source: Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
Libraries
Use these libraries to find Speaker Recognition models and implementationsDatasets
Most implemented papers
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
PaddleSpeech is an open-source all-in-one speech toolkit.
Toroidal Probabilistic Spherical Discriminant Analysis
It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere.
TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement
We propose an objective for perceptual quality based on temporal acoustic parameters.
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models.
Unified Hypersphere Embedding for Speaker Recognition
Incremental improvements in accuracy of Convolutional Neural Networks are usually achieved through use of deeper and more complex models trained on larger datasets.
Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model
In this paper, we propose a Convolutional Neural Network (CNN) based speaker recognition model for extracting robust speaker embeddings.
Additive Margin SincNet for Speaker Recognition
The Softmax loss function is a widely used function in deep learning methods, but it is not the best choice for all kind of problems.
BERTphone: Phonetically-Aware Encoder Representations for Utterance-Level Speaker and Language Recognition
We introduce BERTphone, a Transformer encoder trained on large speech corpora that outputs phonetically-aware contextual representation vectors that can be used for both speaker and language recognition.
Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks
When reducing the training data to only using the train set, our method results in 309 confusions for the Multi-target speaker identification task, which is 46% better than the baseline model.
Delving into VoxCeleb: environment invariant speaker recognition
Research in speaker recognition has recently seen significant progress due to the application of neural network models and the availability of new large-scale datasets.