Scene text detection has been significantly advanced over recent years, especially after the emergence of deep neural network.
Recently, recurrent neural network transducer (RNN-T) gains increasing popularity due to its natural streaming capability as well as superior performance.
Ranked #9 on Speech Recognition on AISHELL-1
The simulation module is jointly trained with the ASR model using a self-supervised loss; the ASR model is optimized with the usual ASR loss, e. g., CTC-CRF as used in our experiments.
In this paper, we systematically compare the performance of three schemes to exploit external single-channel data for multi-channel end-to-end ASR, namely back-end pre-training, data scheduling, and data simulation, under different settings such as the sizes of the single-channel data and the choices of the front-end.
Based on the DR method, we propose a low-order density ratio method (LODR) by replacing the estimation with a low-order weak language model.
The use of phonological features (PFs) potentially allows language-specific phones to remain linked in training, which is highly desirable for information sharing for multilingual and crosslingual speech recognition methods for low-resourced languages.
Recently, the end-to-end training approach for neural beamformer-supported multi-channel ASR has shown its effectiveness in multi-channel speech recognition.
Time Delay Neural Networks (TDNNs) are widely used in both DNN-HMM based hybrid speech recognition systems and recent end-to-end systems.
Automatic speech recognition (ASR) has been significantly advanced with the use of deep learning and big data.
Sound Audio and Speech Processing
Using ST gradients to support sub-graph sampling is a core element to achieve efficient NAS beyond DARTS and SNAS.
Ranked #1 on Speech Recognition on WSJ dev93
In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit).
Ranked #1 on Speech Recognition on Hub5'00 FISHER-SWBD
In this paper, we present a new open source toolkit for automatic speech recognition (ASR), named CAT (CRF-based ASR Toolkit).