speaker-diarization

Speech recognition (ASR) and speaker diarization (SD) models have traditionally been trained separately to produce rich conversation transcripts with speaker labels.

Paper
Code

End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors

hitachi-speech/EEND • 20 May 2020

End-to-end speaker diarization for an unknown number of speakers is addressed in this paper.

Paper
Code

The Third DIHARD Diarization Challenge

dihardchallenge/dihard3_baseline • 2 Dec 2020

DIHARD III was the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variability in recording equipment, noise conditions, and conversational domain.

Paper
Code

Speech Emotion Diarization: Which Emotion Appears When?

speechbrain/speechbrain • • 22 Jun 2023

Speech Emotion Recognition (SER) typically relies on utterance-level solutions.

Paper
Code

End-to-End Neural Speaker Diarization with Self-attention

hitachi-speech/EEND • 13 Sep 2019

Our method was even better than that of the state-of-the-art x-vector clustering-based method.

Paper
Code

VoxLingua107: a Dataset for Spoken Language Recognition

alumae/torch-xvectors-wav • • 25 Nov 2020

Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech.

Paper
Code

A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AI

wallscope-research/incremental-asr-processing • COLING 2020

Automatic Speech Recognition (ASR) systems are increasingly powerful and more accurate, but also more numerous with several options existing currently as a service (e. g. Google, IBM, and Microsoft).

Paper
Code

speaker-diarization

Benchmarks Add a Result

Libraries

Most implemented papers

Content

Benchmarks

Add a Result