18 papers with code • 2 benchmarks • 2 datasets
Obtain the clean speech of the target speaker by suppressing the background noise.
Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.
In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including monaural speech separation, monaural singing voice separation, and speech denoising.
We present an end-to-end deep learning approach to denoising speech signals by processing the raw waveform directly.
We visualize the outputs of such connections, projected back to the spectral domain, in models trained for speech denoising, and show that while skip connections do not necessarily improve performance with regards to the number of parameters, they make speech enhancement models more interpretable.
Speech enhancement tasks have seen significant improvements with the advance of deep learning technology, but with the cost of increased computational complexity.
In WaveCRN, the speech locality feature is captured by a convolutional neural network (CNN), while the temporal sequential property of the locality feature is modeled by stacked simple recurrent units (SRU).
In this paper, we investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks.