Traditional AI approaches in customized (personalized) contextual pricing applications assume that the data distribution at the time of online pricing is similar to that observed during training.
However, we observe that training of MoChA models seems to be more sensitive to various factors such as the characteristics of training sets and the incorporation of additional augmentations techniques.
To improve the accuracy of a low-resource Italian ASR, we leverage a well-trained English model, unlabeled text corpus, and unlabeled audio corpus using transfer learning, TTS augmentation, and SSL respectively.
In this paper, we present a streaming end-to-end speech recognition model based on Monotonic Chunkwise Attention (MoCha) jointly trained with enhancement layers.
Conventional speech recognition systems comprise a large number of discrete components such as an acoustic model, a language model, a pronunciation model, a text-normalizer, an inverse-text normalizer, a decoder based on a Weighted Finite State Transducer (WFST), and so on.
no code implementations • 28 Mar 2020 • AJ Venkatakrishnan, Arjun Puranik, Akash Anand, David Zemmour, Xiang Yao, Xiaoying Wu, Ramakrishna Chilaka, Dariusz K. Murakowski, Kristopher Standish, Bharathwaj Raghunathan, Tyler Wagner, Enrique Garcia-Rivera, Hugo Solomon, Abhinav Garg, Rakesh Barve, Anuli Anyanwu-Ofili, Najat Khan, Venky Soundararajan
The COVID-19 pandemic demands assimilation of all available biomedical knowledge to decode its mechanisms of pathogenicity and transmission.
In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models.
no code implementations • 22 Dec 2019 • Chanwoo Kim, Sungsoo Kim, Kwangyoun Kim, Mehul Kumar, Jiyeon Kim, Kyungmin Lee, Changwoo Han, Abhinav Garg, Eunhyang Kim, Minkyoo Shin, Shatrughan Singh, Larry Heck, Dhananjaya Gowda
Our end-to-end speech recognition system built using this training infrastructure showed a 2. 44 % WER on test-clean of the LibriSpeech test set after applying shallow fusion with a Transformer language model (LM).