BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled data. In particular, on an ASR task with 34k hours of labeled data, by fine-tuning an 8 billion parameter pre-trained Conformer model we can match state-of-the-art (SoTA) performance with only 3% of the training data and significantly improve SoTA with the full training set. We also report on the universal benefits gained from using big pre-trained and self-trained models for a large set of downstream tasks that cover a wide range of speech domains and span multiple orders of magnitudes of dataset sizes, including obtaining SoTA performance on many public benchmarks. In addition, we utilize the learned representation of pre-trained networks to achieve SoTA results on non-ASR tasks.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Speech Recognition AMI IMH ConformerXXL-P + Downstream NST Word Error Rate (WER) 7.8 # 1
Speech Recognition AMI SDM1 ConformerXXL-P Word Error Rate (WER) 17.7 # 1
Speech Recognition CHiME-6 dev_gss12 ConformerXXL-PS Word Error Rate (WER) 26.2 # 2
Speech Recognition CHiME-6 eval ConformerXXL-PS Word Error Rate (WER) 31 # 2
Speech Recognition Common Voice ConformerXXL-P + Downstream NST Test WER 7.7% # 1
Speech Emotion Recognition CREMA-D ConformerXL-P Accuracy 88.2 # 1
Speech Recognition TED-LIUM ConformerXXL-PS Word Error Rate (WER) 5 # 1
Language Identification VoxForge ConformerG-P Accuracy 99.8 # 1
Speech Recognition WSJ eval92 ConformerXXL-P Word Error Rate (WER) 1.3 # 1

Methods


No methods listed for this paper. Add relevant methods here