Speaker Recognition
87 papers with code • 1 benchmarks • 6 datasets
Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.
Source: Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition
Libraries
Use these libraries to find Speaker Recognition models and implementationsDatasets
Most implemented papers
Speaker Recognition from Raw Waveform with SincNet
Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants.
Deep Speaker: an End-to-End Neural Speaker Embedding System
We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity.
Utterance-level Aggregation For Speaker Recognition In The Wild
The objective of this paper is speaker recognition "in the wild"-where utterances may be of variable length and also contain irrelevant signals.
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech.
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
We present a large-scale comparison of various self-supervised models.
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker.
AM-MobileNet1D: A Portable Model for Speaker Recognition
To address this demand, we propose a portable model called Additive Margin MobileNet1D (AM-MobileNet1D) to Speaker Identification on mobile devices.
AutoSpeech: Neural Architecture Search for Speaker Recognition
Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet.
HLT-NUS SUBMISSION FOR 2020 NIST Conversational Telephone Speech SRE
This work provides a brief description of Human Language Technology (HLT) Laboratory, National University of Singapore (NUS) system submission for 2020 NIST conversational telephone speech (CTS) speaker recognition evaluation (SRE).
Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings
In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring backends are commonly used, namely cosine scoring or PLDA.