1 code implementation • 14 Mar 2024 • Ruixiang Jiang, Lingbo Liu, Changwen Chen
Building upon this disentanglement, we introduce the mixture of prompt experts (MoPE) technique to enhance expressiveness.
no code implementations • 7 Dec 2023 • Zuyao Chen, Jinlin Wu, Zhen Lei, Zhaoxiang Zhang, Changwen Chen
Learning scene graphs from natural language descriptions has proven to be a cheap and promising scheme for Scene Graph Generation (SGG).
1 code implementation • 28 Nov 2023 • Ruixiang Jiang, Lingbo Liu, Changwen Chen
We show that the representation of one modality can effectively guide the prompting of another modality for parameter-efficient multimodal fusion.
no code implementations • 18 Nov 2023 • Zuyao Chen, Jinlin Wu, Zhen Lei, Zhaoxiang Zhang, Changwen Chen
For the more challenging settings of relation-involved open vocabulary SGG, the proposed approach integrates relation-aware pre-training utilizing image-caption data and retains visual-concept alignment through knowledge distillation.
1 code implementation • 12 May 2023 • Ruixiang Jiang, Lingbo Liu, Changwen Chen
Specifically, we propose CLIP-Count, the first end-to-end pipeline that estimates density maps for open-vocabulary objects with text guidance in a zero-shot manner.
Ranked #3 on Cross-Part Crowd Counting on ShanghaiTech A
no code implementations • 8 Oct 2022 • Tao Yang, Haokui Zhang, Wenze Hu, Changwen Chen, Xiaoyu Wang
Transformer models have made tremendous progress in various fields in recent years.
no code implementations • 26 May 2022 • Peipei Zhu, Xiao Wang, Lin Zhu, Zhenglong Sun, Weishi Zheng, YaoWei Wang, Changwen Chen
Inspired by the success of Vision-Language Pre-Trained Models (VL-PTMs) in this research, we attempt to infer the cross-domain cue information about a given image from the large VL-PTMs for the UIC task.
no code implementations • 7 Mar 2022 • Peipei Zhu, Xiao Wang, Yong Luo, Zhenglong Sun, Wei-Shi Zheng, YaoWei Wang, Changwen Chen
The image-level labels are utilized to train a weakly-supervised object recognition model to extract object information (e. g., instance) in an image, and the extracted instances are adopted to infer the relationships among different objects based on an enhanced graph neural network (GNN).