1 code implementation • 1 Mar 2025 • Xin Lin, Chong Shi, Zuopeng Yang, Haojin Tang, Zhili Zhou
Despite their effectiveness, these methods face two challenges: (1) feature granularity deficiency, due to reliance on last layer visual features for text alignment, leading to the neglect of crucial object-level details from intermediate layers; (2) semantic similarity confusion, resulting from CLIP's inherent biases toward certain classes, while LLM-generated descriptions based solely on labels fail to adequately capture inter-class similarities.
no code implementations • 15 Feb 2025 • Zuopeng Yang, Jiluan Fan, Anli Yan, Erdun Gao, Xin Lin, Tao Li, Kanghua mo, Changyu Dong
Multimodal Large Language Models (MLLMs) bridge the gap between visual and textual data, enabling a range of advanced applications.
no code implementations • 16 Sep 2024 • Kun Fang, Qinghua Tao, Zuopeng Yang, Xiaolin Huang, Jie Yang
Out-of-Distribution (OoD) detection aims to justify whether a given sample is from the training distribution of the classifier-under-protection, i. e., In-Distribution (InD), or from OoD.
Out-of-Distribution Detection
Out of Distribution (OOD) Detection
1 code implementation • 10 Mar 2024 • Yuncheng Yang, Chuyan Zhang, Zuopeng Yang, Yuting Gao, Yulei Qin, Ke Li, Xing Sun, Jie Yang, Yun Gu
Prompt learning is effective for fine-tuning foundation models to improve their generalization across a variety of downstream tasks.
no code implementations • 23 Jan 2024 • Xin Lin, Chong Shi, Yibing Zhan, Zuopeng Yang, Yaqi Wu, DaCheng Tao
To address the above problems, in this paper, we introduce a network named TD$^2$-Net that aims at denoising and debiasing for dynamic SGG.
no code implementations • 10 May 2023 • Jianbin Zheng, Daqing Liu, Chaoyue Wang, Minghui Hu, Zuopeng Yang, Changxing Ding, DaCheng Tao
To this end, we propose to generate images conditioned on the compositions of multimodal control signals, where modalities are imperfectly complementary, i. e., composed multimodal conditional image synthesis (CMCIS).
1 code implementation • 5 Feb 2023 • Zuopeng Yang, Tianshu Chu, Xin Lin, Erdun Gao, Daqing Liu, Jie Yang, Chaoyue Wang
The proposed model incorporates a Bias Elimination Cycle that consists of both a forward path and an inverted path, each featuring a Structural Consistency Cycle to ensure the preservation of image content during the editing process.
1 code implementation • 27 Nov 2022 • Minghui Hu, Chuanxia Zheng, Heliang Zheng, Tat-Jen Cham, Chaoyue Wang, Zuopeng Yang, DaCheng Tao, Ponnuthurai N. Suganthan
The recently developed discrete diffusion models perform extraordinarily well in the text-to-image task, showing significant promise for handling the multi-modality signals.
1 code implementation • CVPR 2022 • Zuopeng Yang, Daqing Liu, Chaoyue Wang, Jie Yang, DaCheng Tao
Compared to existing CNN-based and Transformer-based generation models that entangled modeling on pixel-level&patch-level and object-level&patch-level respectively, the proposed focal attention predicts the current patch token by only focusing on its highly-related tokens that specified by the spatial layout, thereby achieving disambiguation during training.