End-to-End Neural Diarization is a neural network for speaker diarization in which a neural network directly outputs speaker diarization results given a multi-speaker recording. To realize such an end-to-end model, the speaker diarization problem is formulated as a multi-label classification problem and a permutation-free objective function is introduced to directly minimize diarization errors. The EEND method can explicitly handle speaker overlaps during training and inference. Just by feeding multi-speaker recordings with corresponding speaker segment labels, the model can be adapted to real conversations.
Source: End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label ClassificationPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Speaker Diarization | 17 | 41.46% |
Clustering | 8 | 19.51% |
Multi-Label Classification | 3 | 7.32% |
Speech Recognition | 3 | 7.32% |
Speech Separation | 2 | 4.88% |
Action Detection | 1 | 2.44% |
Activity Detection | 1 | 2.44% |
Automatic Speech Recognition (ASR) | 1 | 2.44% |
Change Detection | 1 | 2.44% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |