Automatic Mean Opinion Score (MOS) prediction is crucial to evaluate the perceptual quality of the synthetic speech.
In recent years, deep neural networks (DNNs) based approaches have achieved the start-of-the-art performance for music source separation (MSS).
The L3DAS22 Challenge is aimed at encouraging the development of machine learning strategies for 3D speech enhancement and 3D sound localization and detection in office-like environments.
Speech enhancement methods based on deep learning have surpassed traditional methods.
In this work, a new causal U-net based multiple-in-multiple-out structure is proposed for real-time multi-channel speech enhancement.