Search Results for author: Feilong Chen

Found 14 papers, 6 papers with code

Bridging the Gap between Prior and Posterior Knowledge Selection for Knowledge-Grounded Dialogue Generation

no code implementations EMNLP 2020 Xiuyi Chen, Fandong Meng, Peng Li, Feilong Chen, Shuang Xu, Bo Xu, Jie zhou

Here, we deal with these issues on two aspects: (1) We enhance the prior selection module with the necessary posterior information obtained from the specially designed Posterior Information Prediction Module (PIPM); (2) We propose a Knowledge Distillation Based Training Strategy (KDBTS) to train the decoder with the knowledge selected from the prior distribution, removing the exposure bias of knowledge selection.

Dialogue Generation Knowledge Distillation

DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder

no code implementations3 Nov 2023 Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen, Kai Yu

Our rigorous experiments comprehensively highlight that our ground-breaking approach outpaces existing methods with considerable margins and delivers seamless, intelligible videos in person-generic and multilingual scenarios.

Data Augmentation

VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition

no code implementations31 May 2023 Ziyi Ni, Minglun Han, Feilong Chen, Linghui Meng, Jing Shi, Pin Lv, Bo Xu

In this paper, we first propose ViLaS (Vision and Language into Automatic Speech Recognition), a novel multimodal ASR model based on the continuous integrate-and-fire (CIF) mechanism, which can integrate visual and textual context simultaneously or separately, to facilitate speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

2 code implementations7 May 2023 Feilong Chen, Minglun Han, Haozhi Zhao, Qingyang Zhang, Jing Shi, Shuang Xu, Bo Xu

(3) Integrating multiple modalities: all single-modal encoders are aligned with the LLM through X2L interfaces to integrate multimodal capabilities into the LLM.

Attribute Instruction Following +4

An Online Sparse Streaming Feature Selection Algorithm

no code implementations2 Aug 2022 Feilong Chen, Di wu, Jie Yang, Yi He

In many real applications such as intelligent healthcare platform, streaming feature always has some missing data, which raises a crucial challenge in conducting OSFS, i. e., how to establish the uncertain relationship between sparse streaming features and labels.

feature selection

HiVLP: Hierarchical Vision-Language Pre-Training for Fast Image-Text Retrieval

no code implementations24 May 2022 Feilong Chen, Xiuyi Chen, Jiaxin Shi, Duzhen Zhang, Jianlong Chang, Qi Tian

It also achieves about +4. 9 AR on COCO and +3. 8 AR on Flickr30K than LightingDot and achieves comparable performance with the state-of-the-art (SOTA) fusion-based model METER.

Cross-Modal Retrieval Retrieval +1

Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning

no code implementations15 Apr 2022 Feilong Chen, Xiuyi Chen, Shuang Xu, Bo Xu

Visual Dialog is a challenging vision-language task since the visual dialog agent needs to answer a series of questions after reasoning over both the image content and dialog history.

Contrastive Learning Question Answering +2

Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation

1 code implementation Findings (ACL) 2021 Feilong Chen, Fandong Meng, Xiuyi Chen, Peng Li, Jie zhou

Visual dialogue is a challenging task since it needs to answer a series of coherent questions on the basis of understanding the visual environment.

Dialogue Generation Visual Grounding

GoG: Relation-aware Graph-over-Graph Network for Visual Dialog

no code implementations Findings (ACL) 2021 Feilong Chen, Xiuyi Chen, Fandong Meng, Peng Li, Jie zhou

Specifically, GoG consists of three sequential graphs: 1) H-Graph, which aims to capture coreference relations among dialog history; 2) History-aware Q-Graph, which aims to fully understand the question through capturing dependency relations between words based on coreference resolution on the dialog history; and 3) Question-aware I-Graph, which aims to capture the relations between objects in an image based on fully question representation.

coreference-resolution Implicit Relations +2

Learning to Ground Visual Objects for Visual Dialog

no code implementations Findings (EMNLP) 2021 Feilong Chen, Xiuyi Chen, Can Xu, Daxin Jiang

Specifically, a posterior distribution over visual objects is inferred from both context (history and questions) and answers, and it ensures the appropriate grounding of visual objects during the training process.

Visual Dialog

DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog

1 code implementation18 Dec 2019 Feilong Chen, Fandong Meng, Jiaming Xu, Peng Li, Bo Xu, Jie zhou

Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image.

Multimodal Reasoning Visual Dialog

Cannot find the paper you are looking for? You can Submit a new open access paper.