Speaker Diarization

74 papers with code • 12 benchmarks • 11 datasets

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Benchmarks

Add a Result

These leaderboards are used to track progress in Speaker Diarization

Dataset	Best Model	Compare
CALLHOME	TOLD	See all
NIST-SRE 2000	x-vector (MCGAN)	See all
AMI Lapel	TitaNet-M (NME-SC)	See all
AMI MixHeadset	TitaNet-L (NME-SC)	See all
CH109	TitaNet-S (NME-SC)	See all
DIHARD	pyannote (waveform)	See all
ETAPE	pyannote (waveform)	See all
CALLHOME-109	titanet-s	See all
AMI	pyannote (waveform)	See all
Hub5'00 CallHome	UIS-RNN	See all
DIHARD II	UIS-RNN-SML	See all
AliMeeting	SOND	See all

Show all 12 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speaker Diarization models and implementations

hitachi-speech/EEND

5 papers

350

pyannote/pyannote-audio

3 papers

5,090

alibaba-damo-academy/FunASR

3 papers

3,417

wq2012/SpectralCluster

3 papers

490

See all 5 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

The EURECOM Submission to the First DIHARD Challenge

josepatino/pyBK • 6 Sep 2018

The first DIHARD challenge aims to promote speaker diarization research and to foster progress in domain robustness.

Paper
Code

Fully Supervised Speaker Diarization

google/uis-rnn • • 10 Oct 2018

In this paper, we propose a fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN).

Paper
Code

CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning Speaker Count Estimation

faroit/CountNet • • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2018

Estimating the maximum number of concurrent speakers from single-channel mixtures is a challenging problem and an essential first step to address various audio-based tasks such as blind source separation, speaker diarization, and audio surveillance.

Paper
Code

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

github-zbx/ava_datasets • 5 Jan 2019

The dataset contains temporally labeled face tracks in video, where each face instance is labeled as speaking or not, and whether the speech is audible.

Paper
Code

The Second DIHARD Diarization Challenge: Dataset, task, and baselines

iiscleap/DIHARD_2019_baseline_alltracks • 18 Jun 2019

This paper introduces the second DIHARD challenge, the second in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variation in recording equipment, noise conditions, and conversational domain.

Paper
Code

Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions

UltraSuite/ultrasuite-kaldi • 1 Jul 2019

We investigate the automatic processing of child speech therapy sessions using ultrasound visual biofeedback, with a specific focus on complementing acoustic features with ultrasound images of the tongue for the tasks of speaker diarization and time-alignment of target words.

Paper
Code

LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization

cvqluu/nn-similarity-diarization • • 23 Jul 2019

More and more neural network approaches have achieved considerable improvement upon submodules of speaker diarization system, including speaker change detection and segment-wise speaker embedding extraction.

Paper
Code

End-to-End Neural Speaker Diarization with Permutation-Free Objectives

hitachi-speech/EEND • 12 Sep 2019

To realize such a model, we formulate the speaker diarization problem as a multi-label classification problem, and introduces a permutation-free objective function to directly minimize diarization errors without being suffered from the speaker-label permutation problem.

Paper
Code

Robust speaker recognition using unsupervised adversarial invariance

rperi/speaker-embeddings-UAI-inference • • 3 Nov 2019

In this paper, we address the problem of speaker recognition in challenging acoustic conditions using a novel method to extract robust speaker-discriminative speech representations.

Paper
Code

Supervised online diarization with sample mean loss for multi-domain data

DonkeyShot21/uis-rnn-sml • • 4 Nov 2019

Recently, a fully supervised speaker diarization approach was proposed (UIS-RNN) which models speakers using multiple instances of a parameter-sharing recurrent neural network.

Paper
Code

Speaker Diarization

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result