47 papers with code • 4 benchmarks • 5 datasets
Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants.
We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity.
To address this demand, we propose a portable model called Additive Margin MobileNet1D (AM-MobileNet1D) to Speaker Identification on mobile devices.
Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet.
Mutual Information (MI) or similar measures of statistical dependence are promising tools for learning these representations in an unsupervised way.
Learning meaningful and general representations from unannotated speech that are applicable to a wide range of tasks remains challenging.
Speaker identification models are vulnerable to carefully designed adversarial perturbations of their input signals that induce misclassification.