Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper


 Ranked #1 on Speech Recognition on LibriSpeech test-clean (using extra training data)

     Get a GitHub badge
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
RESULT BENCHMARK
Speech Recognition LibriSpeech test-clean Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light Word Error Rate (WER) 1.4 # 1
Speech Recognition LibriSpeech test-other Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light Word Error Rate (WER) 2.6 # 1

Methods used in the Paper