Preliminary experiments conducted on a subset of TIMIT corpus reveal that the proposed method can make the resulting IRM achieve higher speech quality and intelligibility for the babble noise-corrupted signals compared with the original IRM, indicating that the lowpass filtered temporal feature sequence can learn a superior IRM network for speech enhancement.
Nowadays, time-domain features have been widely used in speech enhancement (SE) networks like frequency-domain features to achieve excellent performance in eliminating noise from input utterances.
In this study, we present revising the STFT loss in DEMUCS by employing the compressed magnitude spectrogram.
In recent decades, many studies have suggested that phase information is crucial for speech enhancement (SE), and time-domain single-channel speech enhancement techniques have shown promise in noise suppression and robust automatic speech recognition (ASR).
Due to the unprecedented breakthroughs brought about by deep learning, speech enhancement (SE) techniques have been developed rapidly and play an important role prior to acoustic modeling to mitigate noise effects on speech.
Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the articulatory properties of the input speech when performing enhancement to attain performance improvements.
In noisy conditions, knowing speech contents facilitates listeners to more effectively suppress background noise components and to retrieve pure speech signals.
Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE).
Speech-related applications deliver inferior performance in complex noise environments.
The presented DWT-based SE method with various scaling factors for the detail part is evaluated with a subset of Aurora-2 database, and the PESQ metric is used to indicate the quality of processed speech signals.