Speaker Identification
71 papers with code • 4 benchmarks • 4 datasets
Libraries
Use these libraries to find Speaker Identification models and implementationsMost implemented papers
Speaker Recognition from Raw Waveform with SincNet
Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants.
Deep Speaker: an End-to-End Neural Speaker Embedding System
We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity.
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training
We integrate the proposed methods into the HuBERT framework.
Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation
We use the representations with two downstream tasks, speaker identification, and phoneme classification.
ATST: Audio Representation Learning with Teacher-Student Transformer
Self-supervised learning (SSL) learns knowledge from a large amount of unlabeled data, and then transfers the knowledge to a specific problem with a limited number of labeled data.
Masked Autoencoders that Listen
Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.
AM-MobileNet1D: A Portable Model for Speaker Recognition
To address this demand, we propose a portable model called Additive Margin MobileNet1D (AM-MobileNet1D) to Speaker Identification on mobile devices.
AutoSpeech: Neural Architecture Search for Speaker Recognition
Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet.
SSAST: Self-Supervised Audio Spectrogram Transformer
However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.