52 papers with code • 1 benchmarks • 5 datasets
Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.
Also, we validate the use of parameterized filterbanks and show that complex-valued representations and masks are beneficial in all conditions.
Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech.
Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones.
Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants.
We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity.
This report describes our submission to the VoxCeleb Speaker Recognition Challenge (VoxSRC) at Interspeech 2020.
The objective of this paper is speaker recognition "in the wild"-where utterances may be of variable length and also contain irrelevant signals.
Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet.
Ranked #1 on Speaker Identification on VoxCeleb1