Speaker Identification
61 papers with code • 4 benchmarks • 4 datasets
Most implemented papers
On Learning Associations of Faces and Voices
We computationally model the overlapping information between faces and voices and show that the learned cross-modal representation contains enough information to identify matching faces and voices with performance similar to that of humans.
Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks
When reducing the training data to only using the train set, our method results in 309 confusions for the Multi-target speaker identification task, which is 46% better than the baseline model.
Delving into VoxCeleb: environment invariant speaker recognition
Research in speaker recognition has recently seen significant progress due to the application of neural network models and the availability of new large-scale datasets.
Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam
First, we propose a time-domain implementation of SpeakerBeam similar to that proposed for a time-domain audio separation network (TasNet), which has achieved state-of-the-art performance for speech separation.
Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs
By combining these two learning schemes, our model outperforms existing state-of-the-art speaker verification models learned with a standard supervised learning framework on short utterance (1-2 seconds) on the VoxCeleb datasets.
Identify Speakers in Cocktail Parties with End-to-End Attention
In scenarios where multiple speakers talk at the same time, it is important to be able to identify the talkers accurately.
audino: A Modern Annotation Tool for Audio and Speech
The tool allows audio data and their corresponding annotations to be uploaded and assigned to a user through a key-based API.
Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings
However, the model required prior knowledge of speaker profiles to perform speaker identification, which significantly limited the application of the model.
Sum-Product Networks for Robust Automatic Speaker Identification
Though current SPN toolkits and learning algorithms are in their infancy, we aim to show that SPNs have the potential to become a useful tool for robust speech processing in the future.
Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers
We propose a new method for speaker diarization that can handle overlapping speech with 2+ people.