no code implementations • 29 Dec 2023 • Qishen Chen, Xinyu Lyu, Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song
Thus, we introduce a plug-and-play method named CITrans, which iteratively trains SGG models with progressively enhanced data.
1 code implementation • 19 Dec 2023 • Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen
Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP.
no code implementations • 9 Aug 2023 • Xinyu Lyu, Lianli Gao, Junlin Xie, Pengpeng Zeng, Yulu Tian, Jie Shao, Heng Tao Shen
To the end, we propose the Multi-Concept Learning (MCL) framework, which ensures a balanced learning process across rare/ uncommon/ common concepts.
2 code implementations • NeurIPS 2022 2022 • Hao Li, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Haonan Zhang, Gongfu Li
To verify the effectiveness of our approach, extensive experiments are conducted on MS-COCO, CUB Captions, and Flickr30K, which are commonly used in cross-modal retrieval.
1 code implementation • 17 Nov 2022 • Pengpeng Zeng, Haonan Zhang, Lianli Gao, Xiangpeng Li, Jin Qian, Heng Tao Shen
Generating consecutive descriptions for videos, i. e., Video Captioning, requires taking full advantage of visual representation along with the generation process.
1 code implementation • 17 Nov 2022 • Pengpeng Zeng, Jinkuan Zhu, Jingkuan Song, Lianli Gao
Specifically, we design a novel embedding method called tree-structured prototype, producing a set of hierarchical representative embeddings which capture the hierarchical semantic structure in textual space.
1 code implementation • 16 Jul 2022 • Chaofan Zheng, Lianli Gao, Xinyu Lyu, Pengpeng Zeng, Abdulmotaleb El Saddik, Heng Tao Shen
Experiments show that our approach achieves a new state-of-the-art performance on VG and GQA datasets and makes a trade-off between the performance of tail predicates and head ones.
no code implementations • 11 Jul 2022 • Xinyu Lyu, Lianli Gao, Pengpeng Zeng, Heng Tao Shen, Jingkuan Song
The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e. g., woman-on/standing on/walking on-beach.
no code implementations • 23 Jun 2022 • Chaofan Zheng, Xinyu Lyu, Yuyu Guo, Pengpeng Zeng, Jingkuan Song, Lianli Gao
SCM is proposed to relieve semantic deviation by ensuring the semantic consistency between the generated scene graph and the ground truth in global and local representations.
no code implementations • 4 Jun 2022 • Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen
Existing visual attention models are generally planar, i. e., different channels of the last conv-layer feature map of an image share the same weight.
no code implementations • 2 Jun 2022 • Lianli Gao, Pengpeng Zeng, Jingkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, Heng Tao Shen
To date, visual question answering (VQA) (i. e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA.
1 code implementation • 19 May 2022 • Xiaoya Chen, Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen
Video captioning is a challenging task that necessitates a thorough comprehension of visual scenes.