Many-Speakers Single Channel Speech Separation with Optimal Permutation Training

18 Apr 2021  ·  Shaked Dovrat, Eliya Nachmani, Lior Wolf ·

Single channel speech separation has experienced great progress in the last few years. However, training neural speech separation for a large number of speakers (e.g., more than 10 speakers) is out of reach for the current methods, which rely on the Permutation Invariant Loss (PIT). In this work, we present a permutation invariant training that employs the Hungarian algorithm in order to train with an $O(C^3)$ time complexity, where $C$ is the number of speakers, in comparison to $O(C!)$ of PIT based methods. Furthermore, we present a modified architecture that can handle the increased number of speakers. Our approach separates up to $20$ speakers and improves the previous results for large $C$ by a wide margin.

PDF Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Speech Separation Libri10Mix Hungarian PIT SI-SDRi 7.78 # 3
Speech Separation Libri15Mix Hungarian PIT SI-SDRi 5.66 # 1
Speech Separation Libri20Mix Hungarian PIT SI-SDRi 4.26 # 2
Speech Separation WSJ0-5mix Hungarian PIT SI-SDRi 13.22 # 1

Results from Other Papers

Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
Speech Separation Libri5Mix Hungarian PIT SI-SDRi 12.72 # 4


No methods listed for this paper. Add relevant methods here