no code implementations • 23 Jan 2025 • Qijie Shao, Linhao Dong, Kun Wei, Sining Sun, Lei Xie
Data2vec is a self-supervised learning (SSL) approach that employs a teacher-student architecture for contextual representation learning via masked prediction, demonstrating remarkable performance in monolingual ASR.
no code implementations • 13 Sep 2024 • Minglun Han, Ye Bai, Chen Shen, Youjia Huang, Mingkun Huang, Zehua Lin, Linhao Dong, Lu Lu, Yuxuan Wang
NEST-RQ employs causal encoders with only left context and uses next token prediction (NTP) as the training task.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 5 Jul 2024 • Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li, Xiaoyang Li, Zeyang Li, Zehua Lin, Rui Liu, Shouda Liu, Lu Lu, Yizhou Lu, Jingting Ma, Shengtao Ma, Yulin Pei, Chen Shen, Tian Tan, Xiaogang Tian, Ming Tu, Bo wang, Hao Wang, Yuping Wang, Yuxuan Wang, Hanzhang Xia, Rui Xia, Shuangyi Xie, Hongmin Xu, Meng Yang, Bihong Zhang, Jun Zhang, Wanyi Zhang, Yang Zhang, Yawei Zhang, Yijie Zheng, Ming Zou
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios.
Ranked #2 on
Speech Recognition
on AISHELL-1
(using extra training data)
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 27 May 2023 • Linhao Dong, Zhecheng An, Peihao Wu, Jun Zhang, Lu Lu, Zejun Ma
We also observe the cross-modal representation extracted by CIF-PT obtains better performance than other neural interfaces for the tasks of SLU, including the dominant speech representation learned from self-supervised pre-training.
1 code implementation • 30 Jan 2022 • Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu
Nowadays, most methods in end-to-end contextual speech recognition bias the recognition process towards contextual knowledge.
no code implementations • 17 Dec 2020 • Minglun Han, Linhao Dong, Shiyu Zhou, Bo Xu
End-to-end (E2E) models have achieved promising results on multiple speech recognition benchmarks, and shown the potential to become the mainstream.
no code implementations • 20 May 2020 • Linhao Dong, Cheng Yi, Jianzong Wang, Shiyu Zhou, Shuang Xu, Xueli Jia, Bo Xu
End-to-end models are gaining wider attention in the field of automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
2 code implementations • 27 May 2019 • Linhao Dong, Bo Xu
In this paper, we propose a novel soft and monotonic alignment mechanism used for sequence transduction.
no code implementations • 18 Feb 2019 • Linhao Dong, Feng Wang, Bo Xu
Experiments on two Mandarin ASR datasets show the replacement of RNNs by the self-attention networks yields a 8. 4%-10. 2% relative character error rate (CER) reduction.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 17 Jun 2018 • Linhao Dong, Shiyu Zhou, Wei Chen, Bo Xu
End-to-end models have been showing superiority in Automatic Speech Recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 16 May 2018 • Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu
Experiments on HKUST datasets demonstrate that the lexicon free modeling units can outperform lexicon related modeling units in terms of character error rate (CER).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 28 Apr 2018 • Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu
Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+8