no code implementations • ECCV 2020 • Shanyu Xiao, Liangrui Peng, Ruijie Yan, Keyu An, Gang Yao, Jaesik Min
Scene text detection has been significantly advanced over recent years, especially after the emergence of deep neural network.
no code implementations • 26 Sep 2023 • Keyu An, Shiliang Zhang
Recently, self-attention-based transformers and conformers have been introduced as alternatives to RNNs for ASR acoustic modeling.
1 code implementation • 19 May 2023 • Keyu An, Xian Shi, Shiliang Zhang
Recently, recurrent neural network transducer (RNN-T) gains increasing popularity due to its natural streaming capability as well as superior performance.
Ranked #9 on Speech Recognition on AISHELL-1
Automatic Speech Recognition Automatic Speech Recognition (ASR)
1 code implementation • 31 Mar 2022 • Keyu An, Huahuan Zheng, Zhijian Ou, Hongyu Xiang, Ke Ding, Guanglu Wan
The simulation module is jointly trained with the ASR model using a self-supervised loss; the ASR model is optimized with the usual ASR loss, e. g., CTC-CRF as used in our experiments.
no code implementations • 31 Mar 2022 • Keyu An, Ji Xiao, Zhijian Ou
In this paper, we systematically compare the performance of three schemes to exploit external single-channel data for multi-channel end-to-end ASR, namely back-end pre-training, data scheduling, and data simulation, under different settings such as the sizes of the single-channel data and the choices of the front-end.
no code implementations • 31 Mar 2022 • Huahuan Zheng, Keyu An, Zhijian Ou, Chen Huang, Ke Ding, Guanglu Wan
Based on the DR method, we propose a low-order density ratio method (LODR) by replacing the estimation with a low-order weak language model.
1 code implementation • 11 Jul 2021 • Chengrui Zhu, Keyu An, Huahuan Zheng, Zhijian Ou
The use of phonological features (PFs) potentially allows language-specific phones to remain linked in training, which is highly desirable for information sharing for multilingual and crosslingual speech recognition methods for low-resourced languages.
no code implementations • 6 Jul 2021 • Keyu An, Zhijian Ou
Recently, the end-to-end training approach for neural beamformer-supported multi-channel ASR has shown its effectiveness in multi-channel speech recognition.
no code implementations • 30 Apr 2021 • Keyu An, Yi Zhang, Zhijian Ou
Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based hybrid speech recognition systems and recent end-to-end systems.
no code implementations • 13 Nov 2020 • Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao
Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data.
Sound Audio and Speech Processing
1 code implementation • 11 Nov 2020 • Huahuan Zheng, Keyu An, Zhijian Ou
Using ST gradients to support sub-graph sampling is a core element to achieve efficient NAS beyond DARTS and SNAS.
Ranked #1 on Speech Recognition on WSJ dev93
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 27 May 2020 • Keyu An, Hongyu Xiang, Zhijian Ou
In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit).
Ranked #1 on Speech Recognition on Hub5'00 FISHER-SWBD
2 code implementations • 20 Nov 2019 • Keyu An, Hongyu Xiang, Zhijian Ou
In this paper, we present a new open source toolkit for automatic speech recognition (ASR), named CAT (CRF-based ASR Toolkit).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1