Robust Speech Recognition
22 papers with code • 0 benchmarks • 3 datasets
Benchmarks
These leaderboards are used to track progress in Robust Speech Recognition
Latest papers
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
To this end, we propose to extract a language-space noise embedding from the N-best list to represent the noise conditions of source speech, which can promote the denoising process in GER.
Single Channel Speech Enhancement Using U-Net Spiking Neural Networks
Speech enhancement (SE) is crucial for reliable communication devices or robust speech recognition systems.
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal.
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
We introduce MuAViC, a multilingual audio-visual corpus for robust speech recognition and robust speech-to-text translation providing 1200 hours of audio-visual speech in 9 languages.
Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition
In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude.
Audio-Visual Efficient Conformer for Robust Speech Recognition
We improve previous lip reading methods using an Efficient Conformer back-end on top of a ResNet-18 visual front-end and by adding intermediate CTC losses between blocks.
Robust Speech Recognition via Large-Scale Weak Supervision
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.
CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations
While Self-Supervised Learning has helped reap the benefit of the scale from the available unlabeled data, the learning paradigms are continuously being bettered.
DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition
Moreover, to validate whether the data simulated by DENT-DDSP are able to replace the scarce in-domain noisy data in the noise-robust ASR tasks, several downstream ASR models with the same architecture are trained using the simulated data and the real data.
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
To showcase such integration, we performed experiments on carefully designed synthetic datasets for noisy-reverberant multi-channel ST and SLU tasks, which can be used as benchmark corpora for future research.