2 code implementations • 17 Jun 2024 • Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen
Notably, ASR models trained on GigaSpeech 2 can reduce the word error rate for Thai, Indonesian, and Vietnamese on our challenging and realistic YouTube test set by 25% to 40% compared to the Whisper large-v3 model, with merely 10% model parameters.
1 code implementation • 13 Mar 2024 • Jiayu Du, Jinpeng Li, Guoguo Chen, Wei-Qiang Zhang
In this paper we introduce the SpeechColab Leaderboard, a general-purpose, open-source platform designed for ASR evaluation.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 29 Jun 2022 • Jing Zhao, Haoyu Wang, Jinpeng Li, Shuzhou Chai, Guan-Bo Wang, Guoguo Chen, Wei-Qiang Zhang
For the Constrained training condition, we construct our basic ASR system based on the standard hybrid architecture.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
3 code implementations • 13 Jun 2021 • Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan
This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.
Ranked #1 on
Speech Recognition
on GigaSpeech
no code implementations • 30 Oct 2015 • Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yao, Sanjeev Khudanpur, James Glass
In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers.
2 code implementations • 28 Oct 2015 • David Snyder, Guoguo Chen, Daniel Povey
This report introduces a new corpus of music, speech, and noise.
Sound