In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples.
Specifically, training a bidirectional model like BERT on a discriminative objective such as minimum WER (MWER) has not been explored.
no code implementations • 15 Feb 2021 • Aditya Gourav, Linda Liu, Ankur Gandhe, Yile Gu, Guitang Lan, Xiangyang Huang, Shashank Kalmane, Gautam Tiwari, Denis Filimonov, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko
We also describe a novel second-pass de-biasing approach: used in conjunction with a first-pass shallow fusion that optimizes on oracle WER, we can achieve an additional 14% improvement on personalized content recognition, and even improve accuracy for the general use case by up to 2. 5%.
As voice assistants become more ubiquitous, they are increasingly expected to support and perform well on a wide variety of use-cases across different domains.
We show that this simple method can improve performance on rare words by 3. 7% WER relative without degradation on general test set, and the improvement from USF is additive to any additional language model based rescoring.
We show that our rescoring model trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1. 4% on a general test and by 2. 6% on a rare word test set in terms of word-error-rate relative (WERR).