no code implementations • 11 Sep 2024 • Zhihong Lei, Xingyu Na, MingBin Xu, Ernest Pusateri, Christophe Van Gysel, Yuanyuan Zhang, Shiyi Han, Zhen Huang
Large language models (LLMs) have shown superb capability of modeling multimodal signals including audio and text, allowing the model to generate spoken or textual response given a speech input.
no code implementations • 23 Aug 2024 • Adnan Haider, Xingyu Na, Erik McDermott, Tim Ng, Zhen Huang, Xiaodan Zhuang
This paper introduces a novel training framework called Focused Discriminative Training (FDT) to further improve streaming word-piece end-to-end (E2E) automatic speech recognition (ASR) models trained using either CTC or an interpolation of CTC and attention-based encoder-decoder (AED) loss.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 5 Jun 2024 • Shiyi Han, Zhihong Lei, MingBin Xu, Xingyu Na, Zhen Huang
In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR) models has been remarkable, largely due to advances in deep learning architectures like transformer.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 17 Oct 2022 • Adnan Haider, Tim Ng, Zhen Huang, Xingyu Na, Antti Veikko Rosti
Maximum mutual information (MMI) has become one of the two de facto methods for sequence-level training of speech recognition acoustic models.
no code implementations • 31 Aug 2018 • Jiayu Du, Xingyu Na, Xuechen Liu, Hui Bu
For research community, we hope that AISHELL-2 corpus can be a solid resource for topics like transfer learning and robust ASR.
2 code implementations • 16 Sep 2017 • Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, Hao Zheng
An open-source Mandarin speech corpus called AISHELL-1 is released.
no code implementations • INTERSPEECH 2016 2016 • Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur
Models trained with LFMMI provide a relative word error rate reduction of ∼11. 5%, over those trained with cross-entropy objective function, and ∼8%, over those trained with cross-entropy and sMBR objective functions.
Ranked #5 on Speech Recognition on WSJ eval92