Search Results for author: Zuopeng Yang

Found 9 papers, 5 papers with code

SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection

1 code implementation1 Mar 2025 Xin Lin, Chong Shi, Zuopeng Yang, Haojin Tang, Zhili Zhou

Despite their effectiveness, these methods face two challenges: (1) feature granularity deficiency, due to reliance on last layer visual features for text alignment, leading to the neglect of crucial object-level details from intermediate layers; (2) semantic similarity confusion, resulting from CLIP's inherent biases toward certain classes, while LLM-generated descriptions based solely on labels fail to adequately capture inter-class similarities.

Human-Object Interaction Detection Large Language Model +2

Distraction is All You Need for Multimodal Large Language Model Jailbreaking

no code implementations15 Feb 2025 Zuopeng Yang, Jiluan Fan, Anli Yan, Erdun Gao, Xin Lin, Tao Li, Kanghua mo, Changyu Dong

Multimodal Large Language Models (MLLMs) bridge the gap between visual and textual data, enabling a range of advanced applications.

All Language Modeling +3

Beyond Perceptual Distances: Rethinking Disparity Assessment for Out-of-Distribution Detection with Diffusion Models

no code implementations16 Sep 2024 Kun Fang, Qinghua Tao, Zuopeng Yang, Xiaolin Huang, Jie Yang

Out-of-Distribution (OoD) detection aims to justify whether a given sample is from the training distribution of the classifier-under-protection, i. e., In-Distribution (InD), or from OoD.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

RESTORE: Towards Feature Shift for Vision-Language Prompt Learning

1 code implementation10 Mar 2024 Yuncheng Yang, Chuyan Zhang, Zuopeng Yang, Yuting Gao, Yulei Qin, Ke Li, Xing Sun, Jie Yang, Yun Gu

Prompt learning is effective for fine-tuning foundation models to improve their generalization across a variety of downstream tasks.

Prompt Learning

TD^2-Net: Toward Denoising and Debiasing for Dynamic Scene Graph Generation

no code implementations23 Jan 2024 Xin Lin, Chong Shi, Yibing Zhan, Zuopeng Yang, Yaqi Wu, DaCheng Tao

To address the above problems, in this paper, we introduce a network named TD$^2$-Net that aims at denoising and debiasing for dynamic SGG.

Denoising Graph Generation +3

MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis

no code implementations10 May 2023 Jianbin Zheng, Daqing Liu, Chaoyue Wang, Minghui Hu, Zuopeng Yang, Changxing Ding, DaCheng Tao

To this end, we propose to generate images conditioned on the compositions of multimodal control signals, where modalities are imperfectly complementary, i. e., composed multimodal conditional image synthesis (CMCIS).

Image Generation

Eliminating Contextual Prior Bias for Semantic Image Editing via Dual-Cycle Diffusion

1 code implementation5 Feb 2023 Zuopeng Yang, Tianshu Chu, Xin Lin, Erdun Gao, Daqing Liu, Jie Yang, Chaoyue Wang

The proposed model incorporates a Bias Elimination Cycle that consists of both a forward path and an inverted path, each featuring a Structural Consistency Cycle to ensure the preservation of image content during the editing process.

Text-to-Image Generation

Unified Discrete Diffusion for Simultaneous Vision-Language Generation

1 code implementation27 Nov 2022 Minghui Hu, Chuanxia Zheng, Heliang Zheng, Tat-Jen Cham, Chaoyue Wang, Zuopeng Yang, DaCheng Tao, Ponnuthurai N. Suganthan

The recently developed discrete diffusion models perform extraordinarily well in the text-to-image task, showing significant promise for handling the multi-modality signals.

multimodal generation Text Generation +1

Modeling Image Composition for Complex Scene Generation

1 code implementation CVPR 2022 Zuopeng Yang, Daqing Liu, Chaoyue Wang, Jie Yang, DaCheng Tao

Compared to existing CNN-based and Transformer-based generation models that entangled modeling on pixel-level&patch-level and object-level&patch-level respectively, the proposed focal attention predicts the current patch token by only focusing on its highly-related tokens that specified by the spatial layout, thereby achieving disambiguation during training.

Layout-to-Image Generation Object +1

Cannot find the paper you are looking for? You can Submit a new open access paper.