pyannote.audio: neural building blocks for speaker diarization

We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding -- reaching state-of-the-art performance for most of them.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Speaker Diarization AMI pyannote (MFCC) DER(%) 6.3 # 2
FA 3.5 # 1
Miss 2.7 # 2
Speaker Diarization AMI pyannote (waveform) DER(%) 6.0 # 1
FA 3.6 # 2
Miss 2.4 # 1
Speaker Diarization DIHARD pyannote (waveform) DER(%) 9.9 # 1
FA 5.7 # 1
Miss 4.2 # 2
Speaker Diarization DIHARD Baseline (the best result in the literature as of Oct.2019) DER(%) 11.2 # 3
FA 6.5 # 2
Miss 4.7 # 3
Speaker Diarization DIHARD pyannote (MFCC) DER(%) 10.5 # 2
FA 6.8 # 3
Miss 3.7 # 1
Speaker Diarization ETAPE Baseline DER(%) 7.7 # 3
FA 7.5 # 3
Miss 0.2 # 1
Speaker Diarization ETAPE pyannote (MFCC) DER(%) 5.6 # 2
FA 5.2 # 2
Miss 0.4 # 2
Speaker Diarization ETAPE pyannote (waveform) DER(%) 4.9 # 1
FA 4.2 # 1
Miss 0.7 # 3

Methods


No methods listed for this paper. Add relevant methods here