Search Results for author: Changwen Chen

Found 8 papers, 3 papers with code

Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition

no code implementations7 Mar 2022 Peipei Zhu, Xiao Wang, Yong Luo, Zhenglong Sun, Wei-Shi Zheng, YaoWei Wang, Changwen Chen

The image-level labels are utilized to train a weakly-supervised object recognition model to extract object information (e. g., instance) in an image, and the extracted instances are adopted to infer the relationships among different objects based on an enhanced graph neural network (GNN).

Image Captioning Object +1

Prompt-based Learning for Unpaired Image Captioning

no code implementations26 May 2022 Peipei Zhu, Xiao Wang, Lin Zhu, Zhenglong Sun, Weishi Zheng, YaoWei Wang, Changwen Chen

Inspired by the success of Vision-Language Pre-Trained Models (VL-PTMs) in this research, we attempt to infer the cross-domain cue information about a given image from the large VL-PTMs for the UIC task.

Image Captioning Question Answering +2

CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

1 code implementation12 May 2023 Ruixiang Jiang, Lingbo Liu, Changwen Chen

Specifically, we propose CLIP-Count, the first end-to-end pipeline that estimates density maps for open-vocabulary objects with text guidance in a zero-shot manner.

Cross-Part Crowd Counting Cross-Part Evaluation +6

Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention

no code implementations18 Nov 2023 Zuyao Chen, Jinlin Wu, Zhen Lei, Zhaoxiang Zhang, Changwen Chen

For the more challenging settings of relation-involved open vocabulary SGG, the proposed approach integrates relation-aware pre-training utilizing image-caption data and retains visual-concept alignment through knowledge distillation.

Concept Alignment Graph Generation +6

Conditional Prompt Tuning for Multimodal Fusion

1 code implementation28 Nov 2023 Ruixiang Jiang, Lingbo Liu, Changwen Chen

We show that the representation of one modality can effectively guide the prompting of another modality for parameter-efficient multimodal fusion.

GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives

no code implementations7 Dec 2023 Zuyao Chen, Jinlin Wu, Zhen Lei, Zhaoxiang Zhang, Changwen Chen

Learning scene graphs from natural language descriptions has proven to be a cheap and promising scheme for Scene Graph Generation (SGG).

Graph Generation Scene Graph Generation +1

MoPE: Parameter-Efficient and Scalable Multimodal Fusion via Mixture of Prompt Experts

1 code implementation14 Mar 2024 Ruixiang Jiang, Lingbo Liu, Changwen Chen

Building upon this disentanglement, we introduce the mixture of prompt experts (MoPE) technique to enhance expressiveness.

Disentanglement Multimodal Deep Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.