Search Results for author: Cairong Zhao

Found 28 papers, 17 papers with code

Exploring Interpretability for Visual Prompt Tuning with Hierarchical Concepts

no code implementations8 Mar 2025 Yubin Wang, Xinyang Jiang, De Cheng, Xiangqian Zhao, Zilong Wang, Dongsheng Li, Cairong Zhao

Visual prompt tuning offers significant advantages for adapting pre-trained visual foundation models to specific tasks.

Visual Prompt Tuning

FaceShot: Bring Any Character into Life

no code implementations2 Mar 2025 Junyao Gao, Yanan sun, Fei Shen, Xin Jiang, Zhening Xing, Kai Chen, Cairong Zhao

With this powerful generalization capability, FaceShot can significantly extend the application of portrait animation by breaking the limitation of realistic portrait landmark detection for any stylized character and driven video.

Portrait Animation

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

1 code implementation31 Oct 2024 Weicai Ye, Chenhao Ji, Zheng Chen, Junyao Gao, Xiaoshui Huang, Song-Hai Zhang, Wanli Ouyang, Tong He, Cairong Zhao, Guofeng Zhang

Then, we propose a novel text-driven panoramic generation framework, termed DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation.

Scene Generation

Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection

no code implementations30 Sep 2024 Yubin Wang, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Errui Ding, Cairong Zhao

We present Uni$^2$Det, a brand new framework for unified and universal multi-dataset training on 3D detection, enabling robust performance across diverse domains and generalization to unseen domains.

HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling

2 code implementations27 Aug 2024 Yubin Wang, Xinyang Jiang, De Cheng, Wenli Sun, Dongsheng Li, Cairong Zhao

Specifically, we introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning.

Domain Generalization Prompt Engineering

ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding

no code implementations13 Aug 2024 Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li, Cairong Zhao

Video temporal grounding is an emerging topic aiming to identify specific clips within videos.

StyleShot: A Snapshot on Any Style

2 code implementations1 Jul 2024 Junyao Gao, Yanchen Liu, Yanan sun, Yinhao Tang, Yanhong Zeng, Kai Chen, Cairong Zhao

In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning.

Image Generation Style Transfer

DiffPhysBA: Diffusion-based Physical Backdoor Attack against Person Re-Identification in Real-World

no code implementations30 May 2024 Wenli Sun, Xinyang Jiang, Dongsheng Li, Cairong Zhao

Consequently, DiffPhysBA can generate realistic attributes as semantic-level triggers in the digital domain and provides higher physical ASR compared to the direct paste method by 25. 6% on the real-world test set.

Backdoor Attack Person Re-Identification

AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data

2 code implementations29 May 2024 Zifan Song, Yudong Wang, Wenwei Zhang, Kuikun Liu, Chengqi Lyu, Demin Song, Qipeng Guo, Hang Yan, Dahua Lin, Kai Chen, Cairong Zhao

Open-source Large Language Models (LLMs) and their specialized variants, particularly Code LLMs, have recently delivered impressive performance.

Code Generation Diversity +1

PatSTEG: Modeling Formation Dynamics of Patent Citation Networks via The Semantic-Topological Evolutionary Graph

no code implementations3 Feb 2024 Ran Miao, Xueyu Chen, Liang Hu, Zhifei Zhang, Minghua Wan, Qi Zhang, Cairong Zhao

Patent documents in the patent database (PatDB) are crucial for research, development, and innovation as they contain valuable technical information.

Graph Learning

DROP: Decouple Re-Identification and Human Parsing with Task-specific Features for Occluded Person Re-identification

1 code implementation31 Jan 2024 Shuguang Dou, Xiangyang Jiang, Yuanpeng Tu, Junyao Gao, Zefan Qu, Qingsong Zhao, Cairong Zhao

Unlike mainstream approaches using global features for simultaneous multi-task learning of ReID and human parsing, or relying on semantic information for attention guidance, DROP argues that the inferior performance of the former is due to distinct granularity requirements for ReID and human parsing features.

Human Parsing Multi-Task Learning +1

Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

2 code implementations11 Dec 2023 Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li, Cairong Zhao

To address this limitation and prioritize harnessing structured knowledge, this paper advocates for leveraging LLMs to build a graph for each description to model the entities and attributes describing the category, as well as their correlations.

Prompt Engineering

Online Video Quality Enhancement with Spatial-Temporal Look-up Tables

no code implementations22 Nov 2023 Zefan Qu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Cairong Zhao

To the best of our knowledge, we are the first to exploit the LUT structure to extract temporal information in video tasks.

Content-Adaptive Auto-Occlusion Network for Occluded Person Re-Identification

1 code implementation IEEE Transactions on Image Processing 2023 Cairong Zhao, Zefan Qu, Xinyang Jiang, Yuanpeng Tu, Xiang Bai

To address these challenges, we propose a novel Content-Adaptive Auto-Occlusion Network (CAAO), that is able to dynamically select the proper occlusion region of an image based on its content and the current training status.

Occluded Person Re-Identification

Explainability of Speech Recognition Transformers via Gradient-based Attention Visualization

1 code implementation IEEE Transactions on Multimedia 2023 Tianli Sun, Haonan Chen, Guosheng Hu, Lianghua He, Cairong Zhao

In addition, we demonstrate the utilization of visualization result in three ways: (1) We visualize attention with respect to connectionist temporal classification (CTC) loss to train an ASR model with adversarial attention erasing regularization, which effectively decreases the word error rate (WER) of the model and improves its generalization capability.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

ISTVT: Interpretable Spatial-Temporal Video Transformer for Deepfake Detection

1 code implementation IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2023 Cairong Zhao, Chutian Wang, Guosheng Hu, Haonan Chen, Chun Liu, Jinhui Tang

To address these two challenges, in this paper, we propose an Interpretable Spatial-Temporal Video Transformer (ISTVT), which consists of a novel decomposed spatial-temporal self-attention and a self-subtract mechanism to capture spatial artifacts and temporal inconsistency for robust Deepfake detection.

DeepFake Detection Face Swapping

Human Co-Parsing Guided Alignment for Occluded Person Re-identification

1 code implementation IEEE Transactions on Image Processing 2022 Shuguang Dou, Cairong Zhao, Xinyang Jiang, Shanshan Zhang, Wei-Shi Zheng, WangMeng Zuo

Most supervised methods propose to train an extra human parsing model aside from the ReID model with cross-domain human parts annotation, suffering from expensive annotation cost and domain gap; Unsupervised methods integrate a feature clustering-based human parsing process into the ReID model, but lacking supervision signals brings less satisfactory segmentation results.

Human Parsing Occluded Person Re-Identification

Learning Domain Invariant Prompt for Vision-Language Models

1 code implementation8 Dec 2022 Cairong Zhao, Yubin Wang, Xinyang Jiang, Yifei Shen, Kaitao Song, Dongsheng Li, Duoqian Miao

Prompt learning is one of the most effective and trending ways to adapt powerful vision-language foundation models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples.

Domain Generalization Language Modelling +2

Invisible Backdoor Attack with Dynamic Triggers against Person Re-identification

1 code implementation20 Nov 2022 Wenli Sun, Xinyang Jiang, Shuguang Dou, Dongsheng Li, Duoqian Miao, Cheng Deng, Cairong Zhao

Instead of learning fixed triggers for the target classes from the training set, DT-IBA can dynamically generate new triggers for any unknown identities.

All Backdoor Attack +3

Learning from Noisy Labels with Coarse-to-Fine Sample Credibility Modeling

no code implementations23 Aug 2022 Boshen Zhang, Yuxi Li, Yuanpeng Tu, Jinlong Peng, Yabiao Wang, Cunlin Wu, Yang Xiao, Cairong Zhao

Specifically, for the clean set, we deliberately design a memory-based modulation scheme to dynamically adjust the contribution of each sample in terms of its historical credibility sequence during training, thus alleviating the effect from noisy samples incorrectly grouped into the clean set.

Denoising Image Classification

Towards Privacy-Preserving Person Re-identification via Person Identify Shift

no code implementations15 Jul 2022 Shuguang Dou, Xinyang Jiang, Qingsong Zhao, Dongsheng Li, Cairong Zhao

In this paper, we aim to develop a technique that can achieve a good trade-off between privacy protection and data usability for person ReID.

De-identification Person Re-Identification +1

Rethinking the Zigzag Flattening for Image Reading

no code implementations21 Feb 2022 Qingsong Zhao, Yi Wang, Zhipeng Zhou, Duoqian Miao, LiMin Wang, Yu Qiao, Cairong Zhao

Sequence ordering of word vector matters a lot to text reading, which has been proven in natural language processing (NLP).

Image Classification Representation Learning +1

Salience-Guided Iterative Asymmetric Mutual Hashing for Fast Person Re-identification

2 code implementations IEEE Transactions on Image Processing 2021 Cairong Zhao, Yuanpeng Tu, Zhihui Lai, Fumin Shen, Heng Tao Shen, Duoqian Miao

Moreover, a novel iterative asymmetric mutual training strategy (IAMT) is proposed to alleviate drawbacks of common mutual learning, which can continuously refine the discriminative regions for SSB and extract regularized dark knowledge for two models as well.

Code Generation Person Re-Identification

Incremental Generative Occlusion Adversarial Suppression Network for Person ReID

1 code implementation IEEE Transactions on Image Processing 2021 Cairong Zhao, Xinbi Lv, Shuguang Dou, Shanshan Zhang, Jun Wu, Liang Wang

The adversarial suppression branch, embedded with two occlusion suppression module, minimizes the generated occlusion’s response and strengthens attentive feature representation on human non-occluded body regions.

Data Augmentation Person Re-Identification

Deep Fusion Feature Representation Learning with Hard Mining Center-Triplet Loss for Person Re-identification

1 code implementation IEEE Transactions on Multimedia 2020 Cairong Zhao, Xinbi Lv, Zhang Zhang, WangMeng Zuo, Jun Wu, Duoqian Miao

The extraction of robust feature representations from pedestrian images through CNNs with a single deterministic pooling operation is problematic as the features in real pedestrian images are complex and diverse.

Person Re-Identification Representation Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.