no code implementations • EMNLP 2020 • Xiuyi Chen, Fandong Meng, Peng Li, Feilong Chen, Shuang Xu, Bo Xu, Jie zhou
Here, we deal with these issues on two aspects: (1) We enhance the prior selection module with the necessary posterior information obtained from the specially designed Posterior Information Prediction Module (PIPM); (2) We propose a Knowledge Distillation Based Training Strategy (KDBTS) to train the decoder with the knowledge selected from the prior distribution, removing the exposure bias of knowledge selection.
no code implementations • 3 Nov 2023 • Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen, Kai Yu
Our rigorous experiments comprehensively highlight that our ground-breaking approach outpaces existing methods with considerable margins and delivers seamless, intelligible videos in person-generic and multilingual scenarios.
no code implementations • 31 May 2023 • Ziyi Ni, Minglun Han, Feilong Chen, Linghui Meng, Jing Shi, Pin Lv, Bo Xu
In this paper, we first propose ViLaS (Vision and Language into Automatic Speech Recognition), a novel multimodal ASR model based on the continuous integrate-and-fire (CIF) mechanism, which can integrate visual and textual context simultaneously or separately, to facilitate speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
2 code implementations • 7 May 2023 • Feilong Chen, Minglun Han, Haozhi Zhao, Qingyang Zhang, Jing Shi, Shuang Xu, Bo Xu
(3) Integrating multiple modalities: all single-modal encoders are aligned with the LLM through X2L interfaces to integrate multimodal capabilities into the LLM.
2 code implementations • 30 Jan 2023 • Minglun Han, Feilong Chen, Jing Shi, Shuang Xu, Bo Xu
Large-scale pre-trained language models (PLMs) have shown great potential in natural language processing tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 2 Aug 2022 • Feilong Chen, Di wu, Jie Yang, Yi He
In many real applications such as intelligent healthcare platform, streaming feature always has some missing data, which raises a crucial challenge in conducting OSFS, i. e., how to establish the uncertain relationship between sparse streaming features and labels.
no code implementations • 24 May 2022 • Feilong Chen, Xiuyi Chen, Jiaxin Shi, Duzhen Zhang, Jianlong Chang, Qi Tian
It also achieves about +4. 9 AR on COCO and +3. 8 AR on Flickr30K than LightingDot and achieves comparable performance with the state-of-the-art (SOTA) fusion-based model METER.
no code implementations • 15 Apr 2022 • Feilong Chen, Xiuyi Chen, Shuang Xu, Bo Xu
Visual Dialog is a challenging vision-language task since the visual dialog agent needs to answer a series of questions after reasoning over both the image content and dialog history.
1 code implementation • 18 Feb 2022 • Feilong Chen, Duzhen Zhang, Minglun Han, Xiuyi Chen, Jing Shi, Shuang Xu, Bo Xu
Finally, we discuss the new frontiers in VLP.
1 code implementation • Findings (ACL) 2021 • Feilong Chen, Fandong Meng, Xiuyi Chen, Peng Li, Jie zhou
Visual dialogue is a challenging task since it needs to answer a series of coherent questions on the basis of understanding the visual environment.
no code implementations • Findings (ACL) 2021 • Feilong Chen, Xiuyi Chen, Fandong Meng, Peng Li, Jie zhou
Specifically, GoG consists of three sequential graphs: 1) H-Graph, which aims to capture coreference relations among dialog history; 2) History-aware Q-Graph, which aims to fully understand the question through capturing dependency relations between words based on coreference resolution on the dialog history; and 3) Question-aware I-Graph, which aims to capture the relations between objects in an image based on fully question representation.
no code implementations • Findings (EMNLP) 2021 • Feilong Chen, Xiuyi Chen, Can Xu, Daxin Jiang
Specifically, a posterior distribution over visual objects is inferred from both context (history and questions) and answers, and it ensures the appropriate grounding of visual objects during the training process.
1 code implementation • 18 Dec 2019 • Feilong Chen, Fandong Meng, Jiaming Xu, Peng Li, Bo Xu, Jie zhou
Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image.