1 code implementation • 29 Jun 2022 • Yongjun Jiang, Jian Yu, Wenwen Yang, Bihong Zhang, Yanfeng Wang
To the best of our knowledge, the proposed Nextformer model achieves SOTA results on AISHELL-1(CER 4. 06%) and WenetSpeech(CER 7. 56%/11. 29%).
no code implementations • 8 Apr 2021 • Zhichao Wang, Wenwen Yang, Pan Zhou, Wei Chen
Recently, attention-based encoder-decoder (AED) end-to-end (E2E) models have drawn more and more attention in the field of automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 13 Nov 2018 • Pan Zhou, Wenwen Yang, Wei Chen, Yan-Feng Wang, Jia Jia
In this paper, we propose a novel multimodal attention based method for audio-visual speech recognition which could automatically learn the fused representation from both modalities based on their importance.
Audio-Visual Speech Recognition Robust Speech Recognition +2