Search Results for author: Shen Ge

Found 21 papers, 3 papers with code

Aligning Source Visual and Target Language Domains for Unpaired Video Captioning

no code implementations22 Nov 2022 Fenglin Liu, Xian Wu, Chenyu You, Shen Ge, Yuexian Zou, Xu sun

To this end, we introduce the unpaired video captioning task aiming to train models without coupled video-caption pairs in target language.

Translation Video Captioning

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

4 code implementations21 Nov 2022 Peng Jin, Jinfa Huang, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song, David A. Clifton, Jie Chen

Most video-and-language representation learning approaches employ contrastive learning, e. g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs.

Ranked #2 on Video Retrieval on LSMDC (text-to-video Mean Rank metric)

Contrastive Learning Representation Learning +5

DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention

no code implementations28 Oct 2022 Fenglin Liu, Xian Wu, Shen Ge, Xuancheng Ren, Wei Fan, Xu sun, Yuexian Zou

To enhance the correlation between vision and language in disentangled spaces, we introduce the visual concepts to DiMBERT which represent visual information in textual format.

Image Captioning Language Modelling +3

Generating Accurate and Faithful Discharge Instructions: Task, Dataset, and Model

2 code implementations23 Oct 2022 Fenglin Liu, Bang Yang, Chenyu You, Xian Wu, Shen Ge, Zhangdaihong Liu, Xu sun, Yang Yang, David A. Clifton

We build a benchmark clinical dataset and propose the Re3Writer, which imitates the working patterns of physicians to first retrieve related working experience from historical PIs written by physicians, then reason related medical knowledge.

Competence-based Multimodal Curriculum Learning for Medical Report Generation

no code implementations ACL 2021 Fenglin Liu, Shen Ge, Yuexian Zou, Xian Wu

Medical report generation task, which targets to produce long and coherent descriptions of medical images, has attracted growing research interests recently.

Image Captioning Medical Report Generation

Graph-in-Graph Network for Automatic Gene Ontology Description Generation

no code implementations10 Jun 2022 Fenglin Liu, Bang Yang, Chenyu You, Xian Wu, Shen Ge, Adelaide Woicik, Sheng Wang

This task aims to automatically generate a sentence that describes the function of a GO term belonging to one of the three categories, i. e., molecular function, biological process, and cellular component.

Sentence

End-to-end Spoken Conversational Question Answering: Task, Dataset and Model

no code implementations Findings (NAACL) 2022 Chenyu You, Nuo Chen, Fenglin Liu, Shen Ge, Xian Wu, Yuexian Zou

To evaluate the capacity of SCQA systems in a dialogue-style interaction, we assemble a Spoken Conversational Question Answering (Spoken-CoQA) dataset with more than 40k question-answer pairs from 4k conversations.

4k Conversational Question Answering +2

Hazard Detection And Avoidance For The Nova-C Lander

no code implementations1 Apr 2022 Joel Getchius, Devin Renshaw, Daniel Posada, Troy Henderson, Lillian Hong, Shen Ge, Giovanni Molina

In early 2022, Intuitive Machines' NOVA-C Lander will touch down on the lunar surface becoming the first commercial endeavor to visit a celestial body.

AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

no code implementations18 Mar 2022 Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, Jing Zhang, Xian Wu

The acquired disease-grounded visual features can better represent the abnormal regions of the input image, which could alleviate data bias problem; 2) MGT module effectively uses the multi-grained features and Transformer framework to generate the long medical report.

Descriptive Image Captioning +1

Knowledge Matters: Radiology Report Generation with General and Specific Knowledge

no code implementations30 Dec 2021 Shuxin Yang, Xian Wu, Shen Ge, Shaohua Kevin Zhou, Li Xiao

In this paper, we propose a knowledge-enhanced radiology report generation approach introduces two types of medical knowledge: 1) General knowledge, which is input independent and provides the broad knowledge for report generation; 2) Specific knowledge, which is input dependent and provides the fine-grained knowledge for report generation.

General Knowledge Image Captioning

Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model

no code implementations4 Jul 2021 Zhiqi Huang, Fenglin Liu, Xian Wu, Shen Ge, Helin Wang, Wei Fan, Yuexian Zou

As a result, the proposed approach can handle various tasks including: Audio-Oriented Multimodal Machine Comprehension, Machine Reading Comprehension and Machine Listening Comprehension, in a single model, making fair comparisons possible between our model and the existing unimodal MC models.

Knowledge Distillation Machine Reading Comprehension

Contrastive Attention for Automatic Chest X-ray Report Generation

no code implementations Findings (ACL) 2021 Fenglin Liu, Changchang Yin, Xian Wu, Shen Ge, Ping Zhang, Yuexian Zou, Xu sun

In addition, according to the analysis, the CA model can help existing models better attend to the abnormal regions and provide more accurate descriptions which are crucial for an interpretable diagnosis.

Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation

no code implementations CVPR 2021 Fenglin Liu, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou

In detail, PoKE explores the posterior knowledge, which provides explicit abnormal visual regions to alleviate visual data bias; PrKE explores the prior knowledge from the prior medical knowledge graph (medical knowledge) and prior radiology reports (working experience) to alleviate textual data bias.

Prophet Attention: Predicting Attention with Future Attention

no code implementations NeurIPS 2020 Fenglin Liu, Xuancheng Ren, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou, Xu sun

Especially for image captioning, the attention based models are expected to ground correct image regions with proper generated words.

Image Captioning

Cannot find the paper you are looking for? You can Submit a new open access paper.