Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form.
This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone.
Domain-adversarial training (DAT) and multi-task learning (MTL) are two common approaches for building accent-robust ASR models.
Entity tags in human-machine dialog are integral to natural language understanding (NLU) tasks in conversational assistants.
no code implementations • • Deepak Muralidharan, Joel Ruben Antony Moniz, Sida Gao, Xiao Yang, Justine Kao, Stephen Pulman, Atish Kothari, Ray Shen, Yinying Pan, Vivek Kaul, Mubarak Seyed Ibrahim, Gang Xiang, Nan Dun, Yidan Zhou, Andy O, Yuan Zhang, Pooja Chitkara, Xuan Wang, Alkesh Patel, Kushal Tayal, Roger Zheng, Peter Grasch, Jason D. Williams, Lin Li
Named Entity Recognition (NER) and Entity Linking (EL) play an essential role in voice assistant interaction, but are challenging due to the special difficulties associated with spoken user queries.
Additionally, in contrast to offline processing of dialog, we also analyze the performance of our model in a more realistic setting i. e. in an online setting where the topic is identified in real time as the dialog progresses.