1 code implementation • 7 Oct 2024 • Zengwei Yao, Wei Kang, Xiaoyu Yang, Fangjun Kuang, Liyong Guo, Han Zhu, Zengrui Jin, Zhaoqing Li, Long Lin, Daniel Povey
Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR), renowned for its simplicity and computational efficiency.
Ranked #1 on Speech Recognition on GigaSpeech DEV
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 1 Sep 2024 • Zengrui Jin, Yifan Yang, Mohan Shi, Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Lingwei Meng, Long Lin, Yong Xu, Shi-Xiong Zhang, Daniel Povey
This paper presents a large-scale far-field overlapping speech dataset, crafted to advance research in speech separation, recognition, and speaker diarization.
1 code implementation • 17 Oct 2023 • Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey
The Conformer has become the most popular encoder model for automatic speech recognition (ASR).
Ranked #3 on Speech Recognition on WenetSpeech
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
2 code implementations • 15 Sep 2023 • Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey
In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50, 000 hours of read English speech derived from LibriVox.
2 code implementations • 14 Sep 2023 • Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey
An additional style prompt can be given to the text encoder and guide the ASR system to output different styles of transcriptions.
1 code implementation • 19 May 2023 • Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey
Neural Transducer and connectionist temporal classification (CTC) are popular end-to-end automatic speech recognition systems.
1 code implementation • 19 May 2023 • Zengwei Yao, Wei Kang, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Yifan Yang, Long Lin, Daniel Povey
Our work is open-sourced and publicly available https://github. com/k2-fsa/k2.
no code implementations • 20 Mar 2023 • Fan Cui, Liyong Guo, Quandong Wang, Peng Gao, Yujun Wang
To address those challenges, we explore representation learning for KWS by self-supervised contrastive learning and self-training with pretrained model.
no code implementations • 20 Mar 2023 • Fan Cui, Liyong Guo, Lang He, Jiyao Liu, Ercheng Pei, Yujun Wang, Dongmei Jiang
Electroencephalography (EEG) plays a vital role in detecting how brain responses to different stimulus.
1 code implementation • 31 Oct 2022 • Liyong Guo, Xiaoyu Yang, Quandong Wang, Yuxiang Kong, Zengwei Yao, Fan Cui, Fangjun Kuang, Wei Kang, Long Lin, Mingshuang Luo, Piotr Zelasko, Daniel Povey
Although on-the-fly teacher label generation tackles this issue, the training speed is significantly slower as the teacher model has to be evaluated every batch.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 31 Oct 2022 • Wei Kang, Liyong Guo, Fangjun Kuang, Long Lin, Mingshuang Luo, Zengwei Yao, Xiaoyu Yang, Piotr Żelasko, Daniel Povey
In this work, we introduce a constrained version of transducer loss to learn strictly monotonic alignments between the sequences; we also improve the standard greedy search and beam search algorithms by limiting the number of symbols that can be emitted per time step in transducer decoding, making it more efficient to decode in parallel with batches.
1 code implementation • 31 Oct 2022 • Wei Kang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Long Lin, Piotr Żelasko, Daniel Povey
In streaming automatic speech recognition (ASR), it is desirable to reduce latency as much as possible while having minimum impact on recognition accuracy.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 23 Jun 2022 • Fangjun Kuang, Liyong Guo, Wei Kang, Long Lin, Mingshuang Luo, Zengwei Yao, Daniel Povey
The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition.
5 code implementations • 10 Dec 2020 • BinBin Zhang, Di wu, Zhuoyuan Yao, Xiong Wang, Fan Yu, Chao Yang, Liyong Guo, Yaguang Hu, Lei Xie, Xin Lei
In this paper, we present a novel two-pass approach to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single model.
Ranked #9 on Speech Recognition on AISHELL-1