SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients)... (read more)

PDF Abstract

Datasets


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Speech Recognition Hub5'00 SwitchBoard LAS + SpecAugment (with LM, Switchboard mild policy) CallHome 14.6 # 2
SwitchBoard 6.8 # 1
Speech Recognition Hub5'00 SwitchBoard LAS + SpecAugment (with LM, Switchboard strong policy) CallHome 14 # 1
SwitchBoard 7.1 # 2
Speech Recognition LibriSpeech test-clean LAS + SpecAugment Word Error Rate (WER) 2.5 # 23
Speech Recognition LibriSpeech test-clean LAS (no LM) Word Error Rate (WER) 2.7 # 25
Speech Recognition LibriSpeech test-other LAS + SpecAugment Word Error Rate (WER) 5.8 # 24
Speech Recognition LibriSpeech test-other LAS (no LM) Word Error Rate (WER) 6.5 # 26

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet