Speaker Diarization

42 papers with code • 1 benchmarks • 7 datasets

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Libraries

Use these libraries to find Speaker Diarization models and implementations

Most implemented papers

Speaker Diarization with LSTM

wq2012/SpectralCluster 28 Oct 2017

For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications.

pyannote.audio: neural building blocks for speaker diarization

pyannote/pyannote-audio 4 Nov 2019

We introduce pyannote. audio, an open-source toolkit written in Python for speaker diarization.

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

TaoRuijie/TalkNet_ASD 5 Jan 2019

The dataset contains temporally labeled face tracks in video, where each face instance is labeled as speaking or not, and whether the speech is audible.

End-to-End Neural Speaker Diarization with Self-attention

hitachi-speech/EEND 13 Sep 2019

Our method was even better than that of the state-of-the-art x-vector clustering-based method.

Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap

tango4j/Auto-Tuning-Spectral-Clustering 5 Mar 2020

In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization.

Speech Recognition and Multi-Speaker Diarization of Long Conversations

calclavia/tal-asrd 16 May 2020

Speech recognition (ASR) and speaker diarization (SD) models have traditionally been trained separately to produce rich conversation transcripts with speaker labels.

End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors

hitachi-speech/EEND 20 May 2020

End-to-end speaker diarization for an unknown number of speakers is addressed in this paper.

VoxLingua107: a Dataset for Spoken Language Recognition

alumae/torch-xvectors-wav 25 Nov 2020

Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech.

A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI

wallscope-research/incremental-asr-evaluation COLING 2020

Automatic Speech Recognition (ASR) systems are increasingly powerful and more accurate, but also more numerous with several options existing currently as a service (e. g. Google, IBM, and Microsoft).

The Third DIHARD Diarization Challenge

dihardchallenge/dihard3_baseline 2 Dec 2020

DIHARD III was the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variability in recording equipment, noise conditions, and conversational domain.