Search Results for author: Xingyu Na

Found 7 papers, 1 papers with code

Contextualization of ASR with LLM using phonetic retrieval-based augmentation

no code implementations11 Sep 2024 Zhihong Lei, Xingyu Na, MingBin Xu, Ernest Pusateri, Christophe Van Gysel, Yuanyuan Zhang, Shiyi Han, Zhen Huang

Large language models (LLMs) have shown superb capability of modeling multimodal signals including audio and text, allowing the model to generate spoken or textual response given a speech input.

Retrieval speech-recognition +1

Focused Discriminative Training For Streaming CTC-Trained Automatic Speech Recognition Models

no code implementations23 Aug 2024 Adnan Haider, Xingyu Na, Erik McDermott, Tim Ng, Zhen Huang, Xiaodan Zhuang

This paper introduces a novel training framework called Focused Discriminative Training (FDT) to further improve streaming word-piece end-to-end (E2E) automatic speech recognition (ASR) models trained using either CTC or an interpolation of CTC and attention-based encoder-decoder (AED) loss.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Enhancing CTC-based speech recognition with diverse modeling units

no code implementations5 Jun 2024 Shiyi Han, Zhihong Lei, MingBin Xu, Xingyu Na, Zhen Huang

In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR) models has been remarkable, largely due to advances in deep learning architectures like transformer.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A Treatise On FST Lattice Based MMI Training

no code implementations17 Oct 2022 Adnan Haider, Tim Ng, Zhen Huang, Xingyu Na, Antti Veikko Rosti

Maximum mutual information (MMI) has become one of the two de facto methods for sequence-level training of speech recognition acoustic models.

speech-recognition Speech Recognition

AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale

no code implementations31 Aug 2018 Jiayu Du, Xingyu Na, Xuechen Liu, Hui Bu

For research community, we hope that AISHELL-2 corpus can be a solid resource for topics like transfer learning and robust ASR.

Chinese Word Segmentation speech-recognition +2

Purely sequence-trained neural networks for ASR based on lattice-free MMI

no code implementations INTERSPEECH 2016 2016 Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur

Models trained with LFMMI provide a relative word error rate reduction of ∼11. 5%, over those trained with cross-entropy objective function, and ∼8%, over those trained with cross-entropy and sMBR objective functions.

Language Modelling Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.