no code implementations • 14 Sep 2024 • Rao Ma, Mengjie Qian, Mark Gales, Kate Knill
Finally, most EC models are trained for a specific ASR system requiring retraining whenever the underlying ASR system is changed.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 9 Jul 2024 • Mengjie Qian, Siyuan Tang, Rao Ma, Kate M. Knill, Mark J. F. Gales
If only adaptation parameters are used, the language capabilities are maintained but at the cost of performance in the new language.
no code implementations • 1 Jul 2024 • Rao Ma, Mengjie Qian, Yassir Fathullah, Siyuan Tang, Mark Gales, Kate Knill
By fine-tuning the Whisper decoder with only English-to-Chinese speech translation data, improved performance for translation to Chinese can be obtained for multiple languages, in addition to English.
1 code implementation • 9 May 2024 • Vyas Raina, Rao Ma, Charles McGhee, Kate Knill, Mark Gales
Our experiments demonstrate that the same, universal 0. 64-second adversarial audio segment can successfully mute a target Whisper ASR model for over 97\% of speech samples.
1 code implementation • 15 Nov 2023 • Rao Ma, Adian Liusie, Mark J. F. Gales, Kate M. Knill
Text and vision foundation models can perform many tasks in a zero-shot setting, a desirable property that enables these systems to be applied in general and low-resource settings.
no code implementations • 9 Nov 2023 • Stefano Bannò, Rao Ma, Mengjie Qian, Kate M. Knill, Mark J. F. Gales
This foundation model can be used to replace the whole framework or part of it, e. g., ASR and disfluency removal.
no code implementations • 14 Sep 2023 • Mengjie Qian, Rao Ma, Adian Liusie, Erfan Loweimi, Kate M. Knill, Mark J. F. Gales
To gain a deeper understanding and further insights into the performance differences and limitations of these text sources, we employ a fact-checking approach to analyse the information consistency among them.
no code implementations • 13 Jul 2023 • Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill
Additionally, these models have a tendency to skip disfluencies and hesitations in the output.
no code implementations • 9 Jul 2023 • Rao Ma, Mengjie Qian, Potsawee Manakul, Mark Gales, Kate Knill
In this paper we investigate using ChatGPT, a generative LLM, for ASR error correction.
no code implementations • 1 Jun 2023 • Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill
As speech recognition model sizes and training data requirements grow, it is increasingly common for systems to only be available via APIs from online service providers rather than having direct access to models themselves.
no code implementations • 1 Mar 2023 • Rao Ma, Mark J. F. Gales, Kate M. Knill, Mengjie Qian
Error correction models form an important part of Automatic Speech Recognition (ASR) post-processing to improve the readability and quality of transcriptions.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 2 Nov 2022 • Rao Ma, Xiaobo Wu, Jin Qiu, Yanan Qin, HaiHua Xu, Peihao Wu, Zejun Ma
The proposed method can achieve significantly better performance on the target test sets while it gets minimal performance degradation on the general test set, compared with both shallow and ILME-based LM fusion methods.
no code implementations • 26 Jan 2022 • Yufei Liu, Rao Ma, HaiHua Xu, Yi He, Zejun Ma, Weibin Zhang
In this paper we propose two novel approaches to estimate the ILM based on Listen-Attend-Spell (LAS) framework.
no code implementations • 14 Oct 2020 • Zihan Zhao, Yuncong Liu, Lu Chen, Qi Liu, Rao Ma, Kai Yu
Recently, pre-trained language models like BERT have shown promising performance on multiple natural language processing tasks.
1 code implementation • ACL 2020 • Ruisheng Cao, Su Zhu, Chenyu Yang, Chen Liu, Rao Ma, Yanbin Zhao, Lu Chen, Kai Yu
One daunting problem for semantic parsing is the scarcity of annotation.
no code implementations • 22 Mar 2020 • Su Zhu, Zijian Zhao, Rao Ma, Kai Yu
The proposed approaches are evaluated on three datasets.