Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

3 Feb 2021  ·  Shengkui Zhao, Trung Hieu Nguyen, Bin Ma ·

Deep complex U-Net structure and convolutional recurrent network (CRN) structure achieve state-of-the-art performance for monaural speech enhancement. Both deep complex U-Net and CRN are encoder and decoder structures with skip connections, which heavily rely on the representation power of the complex-valued convolutional layers. In this paper, we propose a complex convolutional block attention module (CCBAM) to boost the representation power of the complex-valued convolutional layers by constructing more informative features. The CCBAM is a lightweight and general module which can be easily integrated into any complex-valued convolutional layers. We integrate CCBAM with the deep complex U-Net and CRN to enhance their performance for speech enhancement. We further propose a mixed loss function to jointly optimize the complex models in both time-frequency (TF) domain and time domain. By integrating CCBAM and the mixed loss, we form a new end-to-end (E2E) complex speech enhancement framework. Ablation experiments and objective evaluations show the superior performance of the proposed approaches (https://github.com/modelscope/ClearerVoice-Studio).

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Speech Enhancement Deep Noise Suppression (DNS) Challenge FRCRN PESQ-WB 3.23 # 4
Speech Enhancement DNS Challenge DCUnet-MC PESQ-NB 3.3 # 1
Speech Enhancement DNS Challenge DCCRN-MC PESQ-NB 3.21 # 2
Speech Enhancement DNS Challenge DCCRN-M PESQ-NB 3.15 # 3
Speech Enhancement DNS Challenge DCCRN PESQ-NB 3.04 # 4
Speech Enhancement VoiceBank + DEMAND D2Former PESQ 3.43 # 9
PESQ-WB 3.43 # 3
Para. (M) 0.86 # 4
Speech Enhancement WSJ0 + DEMAND + RNNoise DCUNet-MC PESQ-NB 3.44 # 1
Speech Enhancement WSJ0 + DEMAND + RNNoise DCUNet PESQ-NB 3.25 # 3
Speech Enhancement WSJ0 + DEMAND + RNNoise DCCRN-M PESQ-NB 3.28 # 2

Methods