1 code implementation • 27 Oct 2020 • Dongwei Jiang, Wubo Li, Miao Cao, Wei Zou, Xiangang Li
Self-supervised visual pretraining has shown significant progress recently.
no code implementations • 21 Oct 2020 • Wubo Li, Dongwei Jiang, Wei Zou, Xiangang Li
Audio Visual Scene-aware Dialog (AVSD) is a task to generate responses when discussing about a given video.
no code implementations • 19 Oct 2020 • Tingwei Guo, Cheng Wen, Dongwei Jiang, Ne Luo, Ruixiong Zhang, Shuaijiang Zhao, Wubo Li, Cheng Gong, Wei Zou, Kun Han, Xiangang Li
This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech.
Audio and Speech Processing
1 code implementation • 20 May 2020 • Dongwei Jiang, Wubo Li, Ruixiong Zhang, Miao Cao, Ne Luo, Yang Han, Wei Zou, Xiangang Li
In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks.
no code implementations • 23 Oct 2019 • Wubo Li, Wei Zou, Xiangang Li
Multimodalities provide promising performance than unimodality in most tasks.
1 code implementation • 22 Oct 2019 • Dongwei Jiang, Xiaoning Lei, Wubo Li, Ne Luo, Yuxuan Hu, Wei Zou, Xiangang Li
Speech recognition technologies are gaining enormous popularity in various industrial applications.
1 code implementation • 26 Jun 2018 • Dayiheng Liu, Quan Guo, Wubo Li, Jiancheng Lv
Given a picture, the first line, the title and the other lines of the poem are successively generated in three stages.