59 papers with code • 12 benchmarks • 10 datasets
Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.
Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm
These leaderboards are used to track progress in Speaker Diarization
LibrariesUse these libraries to find Speaker Diarization models and implementations
Most implemented papers
Speaker Diarization with LSTM
For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications.
pyannote.audio: neural building blocks for speaker diarization
We introduce pyannote. audio, an open-source toolkit written in Python for speaker diarization.
AVA-AVD: Audio-Visual Speaker Diarization in the Wild
Audio-visual speaker diarization aims at detecting "who spoke when" using both auditory and visual signals.
End-to-End Neural Speaker Diarization with Self-attention
Our method was even better than that of the state-of-the-art x-vector clustering-based method.
Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap
In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization.
Speech Recognition and Multi-Speaker Diarization of Long Conversations
Speech recognition (ASR) and speaker diarization (SD) models have traditionally been trained separately to produce rich conversation transcripts with speaker labels.
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors
End-to-end speaker diarization for an unknown number of speakers is addressed in this paper.
VoxLingua107: a Dataset for Spoken Language Recognition
Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech.
A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI
Automatic Speech Recognition (ASR) systems are increasingly powerful and more accurate, but also more numerous with several options existing currently as a service (e. g. Google, IBM, and Microsoft).
The Third DIHARD Diarization Challenge
DIHARD III was the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variability in recording equipment, noise conditions, and conversational domain.