Speaker Identification
47 papers with code • 4 benchmarks • 5 datasets
Most implemented papers
Speaker Recognition from Raw Waveform with SincNet
Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants.
Deep Speaker: an End-to-End Neural Speaker Embedding System
We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity.
AM-MobileNet1D: A Portable Model for Speaker Recognition
To address this demand, we propose a portable model called Additive Margin MobileNet1D (AM-MobileNet1D) to Speaker Identification on mobile devices.
AutoSpeech: Neural Architecture Search for Speaker Recognition
Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet.
Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation
We use the representations with two downstream tasks, speaker identification, and phoneme classification.
Learning Speaker Representations with Mutual Information
Mutual Information (MI) or similar measures of statistical dependence are promising tools for learning these representations in an unsupervised way.
Word-level Embeddings for Cross-Task Transfer Learning in Speech Processing
Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer.
Generative Pre-Training for Speech with Autoregressive Predictive Coding
Learning meaningful and general representations from unannotated speech that are applicable to a wide range of tasks remains challenging.
Contrastive Learning of General-Purpose Audio Representations
We introduce COLA, a self-supervised pre-training approach for learning a general-purpose representation of audio.
FoolHD: Fooling speaker identification by Highly imperceptible adversarial Disturbances
Speaker identification models are vulnerable to carefully designed adversarial perturbations of their input signals that induce misclassification.