no code implementations • 26 Oct 2021 • Tong Shen, Jiawei Zuo, Fan Shi, Jin Zhang, Liqin Jiang, Meng Chen, Zhengchen Zhang, Wei zhang, Xiaodong He, Tao Mei
We demonstrate ViDA-MAN, a digital-human agent for multi-modal interaction, which offers realtime audio-visual responses to instant speech inquiries.
no code implementations • 8 Oct 2021 • Li Fu, Xiaoxiao Li, Runyu Wang, Lu Fan, Zhengchen Zhang, Meng Chen, Youzheng Wu, Xiaodong He
End-to-end Automatic Speech Recognition (ASR) models are usually trained to optimize the loss of the whole token sequence, while neglecting explicit phonemic-granularity supervision.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 6 Nov 2020 • Guanghui Xu, Wei Song, Zhengchen Zhang, Chao Zhang, Xiaodong He, BoWen Zhou
Despite prosody is related to the linguistic information up to the discourse structure, most text-to-speech (TTS) systems only take into account that within each sentence, which makes it challenging when converting a paragraph of texts into natural and expressive speech.
no code implementations • 11 May 2020 • Li Fu, Xiaoxiao Li, Libo Zi, Zhengchen Zhang, Youzheng Wu, Xiaodong He, BoWen Zhou
In this paper, we propose an incremental learning method for end-to-end Automatic Speech Recognition (ASR) which enables an ASR system to perform well on new tasks while maintaining the performance on its originally learned ones.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 15 Dec 2016 • Fugen Zhou, Fuxiang Wu, Zhengchen Zhang, Minghui Dong
This paper presents a novel reranking model, future reward reranking, to re-score the actions in a transition-based parser by using a global scorer.