no code implementations • 3 Sep 2019 • Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Jesper Jensen
Finally, we show that a loss function based on scale-invariant signal-to-distortion ratio (SI-SDR) achieves good general performance across a range of popular speech enhancement evaluation metrics, which suggests that SI-SDR is a good candidate as a general-purpose loss function for speech enhancement systems.
no code implementations • 2 Feb 2018 • Morten Kolbæk, Zheng-Hua Tan, Jesper Jensen
Finally, we show that the proposed SE system performs on par with a traditional DNN based Short-Time Spectral Amplitude (STSA) SE system in terms of estimated speech intelligibility.
Sound Audio and Speech Processing
no code implementations • 31 Aug 2017 • Morten Kolbæk, Dong Yu, Zheng-Hua Tan, Jesper Jensen
We show that deep bi-directional LSTM RNNs trained using uPIT in noisy environments can improve the Signal-to-Distortion Ratio (SDR) as well as the Extended Short-Time Objective Intelligibility (ESTOI) measure, on the speaker independent multi-talker speech separation and denoising task, for various noise types and Signal-to-Noise Ratios (SNRs).
Sound
3 code implementations • 18 Mar 2017 • Morten Kolbæk, Dong Yu, Zheng-Hua Tan, Jesper Jensen
We evaluated uPIT on the WSJ0 and Danish two- and three-talker mixed-speech separation tasks and found that uPIT outperforms techniques based on Non-negative Matrix Factorization (NMF) and Computational Auditory Scene Analysis (CASA), and compares favorably with Deep Clustering (DPCL) and the Deep Attractor Network (DANet).
1 code implementation • 1 Jul 2016 • Dong Yu, Morten Kolbæk, Zheng-Hua Tan, Jesper Jensen
We propose a novel deep learning model, which supports permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem.