Speaker Diarization

74 papers with code • 12 benchmarks • 11 datasets

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Benchmarks

Add a Result

These leaderboards are used to track progress in Speaker Diarization

Dataset	Best Model	Compare
CALLHOME	TOLD	See all
NIST-SRE 2000	x-vector (MCGAN)	See all
AMI Lapel	TitaNet-M (NME-SC)	See all
AMI MixHeadset	TitaNet-L (NME-SC)	See all
CH109	TitaNet-S (NME-SC)	See all
DIHARD	pyannote (waveform)	See all
ETAPE	pyannote (waveform)	See all
CALLHOME-109	titanet-s	See all
AMI	pyannote (waveform)	See all
Hub5'00 CallHome	UIS-RNN	See all
DIHARD II	UIS-RNN-SML	See all
AliMeeting	SOND	See all

Show all 12 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speaker Diarization models and implementations

hitachi-speech/EEND

5 papers

347

pyannote/pyannote-audio

3 papers

5,013

alibaba-damo-academy/FunASR

3 papers

3,284

wq2012/SpectralCluster

3 papers

490

See all 5 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

AVA-AVD: Audio-Visual Speaker Diarization in the Wild

zcxu-eric/ava-avd • • 29 Nov 2021

Audio-visual speaker diarization aims at detecting "who spoke when" using both auditory and visual signals.

Paper
Code

Speaker Diarization with LSTM

wq2012/SpectralCluster • 28 Oct 2017

For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications.

Paper
Code

pyannote.audio: neural building blocks for speaker diarization

pyannote/pyannote-audio • • 4 Nov 2019

We introduce pyannote. audio, an open-source toolkit written in Python for speaker diarization.

Paper
Code

Speech Recognition and Multi-Speaker Diarization of Long Conversations

calclavia/tal-asrd • • 16 May 2020

Speech recognition (ASR) and speaker diarization (SD) models have traditionally been trained separately to produce rich conversation transcripts with speaker labels.

Paper
Code

End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors

hitachi-speech/EEND • 20 May 2020

End-to-end speaker diarization for an unknown number of speakers is addressed in this paper.

Paper
Code

The Third DIHARD Diarization Challenge

dihardchallenge/dihard3_baseline • 2 Dec 2020

DIHARD III was the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variability in recording equipment, noise conditions, and conversational domain.

Paper
Code

End-to-End Neural Speaker Diarization with Self-attention

hitachi-speech/EEND • 13 Sep 2019

Our method was even better than that of the state-of-the-art x-vector clustering-based method.

Paper
Code

VoxLingua107: a Dataset for Spoken Language Recognition

alumae/torch-xvectors-wav • • 25 Nov 2020

Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech.

Paper
Code

A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI

wallscope-research/incremental-asr-processing • COLING 2020

Automatic Speech Recognition (ASR) systems are increasingly powerful and more accurate, but also more numerous with several options existing currently as a service (e. g. Google, IBM, and Microsoft).

Paper
Code

End-to-end speaker segmentation for overlap-aware resegmentation

pyannote/segmentation • 8 Apr 2021

Experiments on multiple speaker diarization datasets conclude that our model can be used with great success on both voice activity detection and overlapped speech detection.

Paper
Code

Speaker Diarization

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result