no code implementations • 10 Jan 2023 • Piyush Behre, Sharman Tan, Padma Varadharajan, Shuangyu Chang
While speech recognition Word Error Rate (WER) has reached human parity for English, continuous speech recognition scenarios such as voice typing and meeting transcriptions still suffer from segmentation and punctuation problems, resulting from irregular pausing patterns or slow speakers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 27 Oct 2022 • Piyush Behre, Sharman Tan, Amy Shah, Harini Kesavamoorthy, Shuangyu Chang, Fei Zuo, Chris Basoglu, Sayan Pathak
Punctuation and Segmentation are key to readability in Automatic Speech Recognition (ASR), often evaluated using F1 scores that require high-quality human transcripts and do not reflect readability well.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 26 Oct 2022 • Piyush Behre, Naveen Parihar, Sharman Tan, Amy Shah, Eva Sharma, Geoffrey Liu, Shuangyu Chang, Hosam Khalil, Chris Basoglu, Sayan Pathak
For the downstream task of machine translation, it improves the translation BLEU score by an average of 1. 05 points.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 26 Oct 2022 • Sharman Tan, Piyush Behre, Nick Kibre, Issac Alphonso, Shuangyu Chang
Features such as punctuation, capitalization, and formatting of entities are important for readability, understanding, and natural language processing tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 11 Oct 2022 • Piyush Behre, Sharman Tan, Padma Varadharajan, Shuangyu Chang
While speech recognition Word Error Rate (WER) has reached human parity for English, long-form dictation scenarios still suffer from segmentation and punctuation problems resulting from irregular pausing patterns or slow speakers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 8 Sep 2022 • Li Miao, Jian Wu, Piyush Behre, Shuangyu Chang, Sarangarajan Parthasarathy
It is challenging to train and deploy Transformer LMs for hybrid speech recognition 2nd pass re-ranking in low-resource languages due to (1) data scarcity in low-resource languages, (2) expensive computing costs for training and refreshing 100+ monolingual models, and (3) hosting inefficiency considering sparse traffic.