no code implementations • 14 Sep 2023 • Lei Zhang, Zhengkun Tian, Xiang Chen, Jiaming Sun, Hongyu Xiang, Ke Ding, Guanglu Wan
To address this issue, we draw inspiration from the multifaceted capabilities of LLMs and Whisper, and focus on integrating multiple ASR text processing tasks related to speech recognition into the ASR model.
no code implementations • 7 Nov 2022 • Zhengkun Tian, Hongyu Xiang, Min Li, Feifei Lin, Ke Ding, Guanglu Wan
To reduce the peak latency, we propose a simple and novel method named peak-first regularization, which utilizes a frame-wise knowledge distillation function to force the probability distribution of the CTC model to shift left along the time axis instead of directly modifying the calculation process of CTC loss and gradients.
1 code implementation • 31 Mar 2022 • Keyu An, Huahuan Zheng, Zhijian Ou, Hongyu Xiang, Ke Ding, Guanglu Wan
The simulation module is jointly trained with the ASR model using a self-supervised loss; the ASR model is optimized with the usual ASR loss, e. g., CTC-CRF as used in our experiments.
1 code implementation • 27 May 2020 • Keyu An, Hongyu Xiang, Zhijian Ou
In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit).
Ranked #1 on Speech Recognition on Hub5'00 FISHER-SWBD
2 code implementations • 20 Nov 2019 • Keyu An, Hongyu Xiang, Zhijian Ou
In this paper, we present a new open source toolkit for automatic speech recognition (ASR), named CAT (CRF-based ASR Toolkit).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 16 Apr 2019 • Hongyu Xiang, Zhijian Ou
CTC-CRF is conceptually simple, which basically implements a CRF layer on top of features generated by the bottom neural network with the special state topology.
Ranked #2 on Speech Recognition on WSJ eval93