Large language models (LLMs) such as GPT-3, OPT, and LLaMA have demonstrated remarkable accuracy in a wide range of tasks.
To enable this, a variety of metrics to measure quality and intelligibility under different assumptions have been developed.
By comparing complex- and real-valued versions of fundamental building blocks in the recently developed gated convolutional recurrent network (GCRN), we show how different mechanisms for basic blocks affect the performance.
During training, our approach augments a model learning complex spectral mapping with a temporary submodel to predict the covariance of the enhancement error at each time-frequency bin.
In addition, our approach effectively preserves the interaural cues, which improves the accuracy of sound localization.
Audio and Speech Processing Sound
Background noise, interfering speech and room reverberation frequently distort target speech in real listening environments.
Audio and Speech Processing Sound Signal Processing
In this study, we analyze the distortion problem, compare different acoustic models, and investigate a distortion-independent training scheme for monaural speech recognition.
This study investigates phase reconstruction for deep learning based monaural talker-independent speaker separation in the short-time Fourier transform (STFT) domain.