no code implementations • 2 Apr 2025 • Zhiyuan Tang, Dong Wang, Zhikai Zhou, Yong liu, Shen Huang, Shidong Shang
Full-text error correction with Large Language Models (LLMs) for Automatic Speech Recognition (ASR) has gained increased attention due to its potential to correct errors across long contexts and address a broader spectrum of error types, including punctuation restoration and inverse text normalization.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 12 Sep 2024 • Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang
This paper investigates the effectiveness of LLMs for error correction in full-text generated by ASR systems from longer speech recordings, such as transcripts from podcasts, news broadcasts, and meetings.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+8
1 code implementation • 2 Jul 2024 • Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang
Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chinese Hypotheses Paradise dataset (ChineseHP), which contains a wide range of scenarios and presents significant challenges.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 29 Sep 2021 • Canyu Le, Zhiyuan Tang, Ke Li, Jiandong Yang
On top of this dataset, we propose a two-stage framework to perform chapter localization and chapter title generation.
no code implementations • 26 Apr 2021 • Jianwei Sun, Zhiyuan Tang, Hengxin Yin, Wei Wang, Xi Zhao, Shuaijiang Zhao, Xiaoning Lei, Wei Zou, Xiangang Li
Augmentation related issues, such as comparison of different strategies and ratios for data combination are also investigated.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 4 Nov 2020 • Ying Shi, Haolin Chen, Zhiyuan Tang, Lantian Li, Dong Wang, Jiqing Han
Recently, speech enhancement (SE) based on deep speech prior has attracted much attention, such as the variational auto-encoder with non-negative matrix factorization (VAE-NMF) architecture.
no code implementations • 4 Jun 2020 • Zheng Li, Miao Zhao, Qingyang Hong, Lin Li, Zhiyuan Tang, Dong Wang, Li-Ming Song, Cheng Yang
Based on Kaldi and Pytorch, recipes for i-vector and x-vector systems are also conducted as baselines for the three tasks.
no code implementations • 16 Jul 2019 • Zhiyuan Tang, Dong Wang, Li-Ming Song
The participants can refer to these online-published recipes to deploy LID systems for convenience.
no code implementations • 8 Nov 2018 • Lantian Li, Zhiyuan Tang, Ying Shi, Dong Wang
This paper proposes a Gaussian-constrained training approach that (1) discards the parametric classifier, and (2) enforces the distribution of the derived speaker vectors to be Gaussian.
no code implementations • 8 Nov 2018 • Lantian Li, Zhiyuan Tang, Ying Shi, Dong Wang
This score reflects the similarity of the two frames in phonetic content, and is used to weigh the contribution of this frame pair in the utterance-based scoring.
1 code implementation • 2 Jun 2018 • Zhiyuan Tang, Dong Wang, Qing Chen
The third oriental language recognition (OLR) challenge AP18-OLR is introduced in this paper, including the data profile, the tasks and the evaluation principles.
no code implementations • 27 Feb 2018 • Lantian Li, Dong Wang, Yixiang Chen, Ying Shi, Zhiyuan Tang, Thomas Fang Zheng
Various informative factors mixed in speech signals, leading to great difficulty when decoding any of the factors.
no code implementations • 15 Nov 2017 • Miao Zhang, Xiaofei Kang, Yanqing Wang, Lantian Li, Zhiyuan Tang, Haisheng Dai, Dong Wang
Trivial events are ubiquitous in human to human conversations, e. g., cough, laugh and sniff.
no code implementations • 31 Oct 2017 • Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng
In recent studies, it has shown that speaker patterns can be learned from very short speech segments (e. g., 0. 3 seconds) by a carefully designed convolutional & time-delay deep neural network (CT-DNN) model.
1 code implementation • 28 Jun 2017 • Zhiyuan Tang, Dong Wang, Yixiang Chen, Qing Chen
We present the data profile and the evaluation plan of the second oriental language recognition (OLR) challenge AP17-OLR.
no code implementations • 22 Jun 2017 • Dong Wang, Lantian Li, Zhiyuan Tang, Thomas Fang Zheng
This principle has recently been applied to several prototype research on speaker verification (SV), where the feature learning and classifier are learned together with an objective function that is consistent with the evaluation metric.
no code implementations • 5 Jun 2017 • Dong Wang, Lantian Li, Ying Shi, Yixiang Chen, Zhiyuan Tang
In this paper, we demonstrated that the speaker factor is also a short-time spectral pattern and can be largely identified with just a few frames using a simple deep neural network (DNN).
no code implementations • 10 May 2017 • Lantian Li, Yixiang Chen, Ying Shi, Zhiyuan Tang, Dong Wang
Recently deep neural networks (DNNs) have been used to learn speaker features.
no code implementations • 9 May 2017 • Zhiyuan Tang, Dong Wang, Yixiang Chen, Ying Shi, Lantian Li
Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID).
no code implementations • 9 May 2017 • Zhiyuan Tang, Dong Wang, Yixiang Chen, Lantian Li, Andrew Abel
Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID).
no code implementations • 28 Sep 2016 • Zhiyuan Tang, Ying Shi, Dong Wang, Yang Feng, Shiyue Zhang
Recurrent neural networks (RNNs) have shown clear superiority in sequence modeling, particularly the ones with gated units, such as long short-term memory (LSTM) and gated recurrent unit (GRU).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 27 Sep 2016 • Lantian Li, Zhiyuan Tang, Dong Wang, Andrew Abel, Yang Feng, Shiyue Zhang
This paper presents a unified model to perform language and speaker recognition simultaneously and altogether.
no code implementations • 27 Sep 2016 • Zhiyuan Tang, Lantian Li, Dong Wang
Research on multilingual speech recognition remains attractive yet challenging.
no code implementations • 27 Sep 2016 • Dong Wang, Zhiyuan Tang, Difei Tang, Qing Chen
We present the OC16-CE80 Chinese-English mixlingual speech database which was released as a main resource for training, development and test for the Chinese-English mixlingual speech recognition (MixASR-CHEN) challenge on O-COCOSDA 2016.
no code implementations • 31 Mar 2016 • Zhiyuan Tang, Lantian Li, Dong Wang
Although highly correlated, speech and speaker recognition have been regarded as two independent tasks and studied by two communities.
no code implementations • 7 Jun 2015 • Zhiyuan Tang, Dong Wang, Yiqiao Pan, Zhiyong Zhang
Compared to the conventional layer-wise methods, this new method does not care about the model structure, so can be used to pre-train very complex models.
no code implementations • 18 May 2015 • Zhiyuan Tang, Dong Wang, Zhiyong Zhang
Recent research found that a well-trained model can be used as a teacher to train other child models, by using the predictions generated by the teacher model as supervision.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2