no code implementations • 8 Mar 2025 • Yubin Wang, Xinyang Jiang, De Cheng, Xiangqian Zhao, Zilong Wang, Dongsheng Li, Cairong Zhao
Visual prompt tuning offers significant advantages for adapting pre-trained visual foundation models to specific tasks.
no code implementations • 2 Mar 2025 • Junyao Gao, Yanan sun, Fei Shen, Xin Jiang, Zhening Xing, Kai Chen, Cairong Zhao
With this powerful generalization capability, FaceShot can significantly extend the application of portrait animation by breaking the limitation of realistic portrait landmark detection for any stylized character and driven video.
1 code implementation • 31 Oct 2024 • Weicai Ye, Chenhao Ji, Zheng Chen, Junyao Gao, Xiaoshui Huang, Song-Hai Zhang, Wanli Ouyang, Tong He, Cairong Zhao, Guofeng Zhang
Then, we propose a novel text-driven panoramic generation framework, termed DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation.
no code implementations • 30 Sep 2024 • Yubin Wang, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Errui Ding, Cairong Zhao
We present Uni$^2$Det, a brand new framework for unified and universal multi-dataset training on 3D detection, enabling robust performance across diverse domains and generalization to unseen domains.
2 code implementations • 27 Aug 2024 • Yubin Wang, Xinyang Jiang, De Cheng, Wenli Sun, Dongsheng Li, Cairong Zhao
Specifically, we introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning.
Ranked #1 on
Prompt Engineering
on ImageNet V2
no code implementations • 13 Aug 2024 • Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li, Cairong Zhao
Video temporal grounding is an emerging topic aiming to identify specific clips within videos.
2 code implementations • 1 Jul 2024 • Junyao Gao, Yanchen Liu, Yanan sun, Yinhao Tang, Yanhong Zeng, Kai Chen, Cairong Zhao
In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning.
Ranked #1 on
Style Transfer
on StyleBench
no code implementations • 30 May 2024 • Wenli Sun, Xinyang Jiang, Dongsheng Li, Cairong Zhao
Consequently, DiffPhysBA can generate realistic attributes as semantic-level triggers in the digital domain and provides higher physical ASR compared to the direct paste method by 25. 6% on the real-world test set.
2 code implementations • 29 May 2024 • Zifan Song, Yudong Wang, Wenwei Zhang, Kuikun Liu, Chengqi Lyu, Demin Song, Qipeng Guo, Hang Yan, Dahua Lin, Kai Chen, Cairong Zhao
Open-source Large Language Models (LLMs) and their specialized variants, particularly Code LLMs, have recently delivered impressive performance.
no code implementations • 3 Feb 2024 • Ran Miao, Xueyu Chen, Liang Hu, Zhifei Zhang, Minghua Wan, Qi Zhang, Cairong Zhao
Patent documents in the patent database (PatDB) are crucial for research, development, and innovation as they contain valuable technical information.
1 code implementation • 31 Jan 2024 • Shuguang Dou, Xiangyang Jiang, Yuanpeng Tu, Junyao Gao, Zefan Qu, Qingsong Zhao, Cairong Zhao
Unlike mainstream approaches using global features for simultaneous multi-task learning of ReID and human parsing, or relying on semantic information for attention guidance, DROP argues that the inferior performance of the former is due to distinct granularity requirements for ReID and human parsing features.
Ranked #4 on
Person Re-Identification
on Occluded-DukeMTMC
2 code implementations • 11 Dec 2023 • Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li, Cairong Zhao
To address this limitation and prioritize harnessing structured knowledge, this paper advocates for leveraging LLMs to build a graph for each description to model the entities and attributes describing the category, as well as their correlations.
Ranked #2 on
Prompt Engineering
on ImageNet V2
no code implementations • 22 Nov 2023 • Zefan Qu, Xinyang Jiang, Yifan Yang, Dongsheng Li, Cairong Zhao
To the best of our knowledge, we are the first to exploit the LUT structure to extract temporal information in video tasks.
1 code implementation • IEEE Transactions on Image Processing 2023 • Cairong Zhao, Zefan Qu, Xinyang Jiang, Yuanpeng Tu, Xiang Bai
To address these challenges, we propose a novel Content-Adaptive Auto-Occlusion Network (CAAO), that is able to dynamically select the proper occlusion region of an image based on its content and the current training status.
1 code implementation • IEEE Transactions on Multimedia 2023 • Tianli Sun, Haonan Chen, Guosheng Hu, Lianghua He, Cairong Zhao
In addition, we demonstrate the utilization of visualization result in three ways: (1) We visualize attention with respect to connectionist temporal classification (CTC) loss to train an ASR model with adversarial attention erasing regularization, which effectively decreases the word error rate (WER) of the model and improves its generalization capability.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2023 • Cairong Zhao, Chutian Wang, Guosheng Hu, Haonan Chen, Chun Liu, Jinhui Tang
To address these two challenges, in this paper, we propose an Interpretable Spatial-Temporal Video Transformer (ISTVT), which consists of a novel decomposed spatial-temporal self-attention and a self-subtract mechanism to capture spatial artifacts and temporal inconsistency for robust Deepfake detection.
1 code implementation • IEEE Transactions on Image Processing 2022 • Shuguang Dou, Cairong Zhao, Xinyang Jiang, Shanshan Zhang, Wei-Shi Zheng, WangMeng Zuo
Most supervised methods propose to train an extra human parsing model aside from the ReID model with cross-domain human parts annotation, suffering from expensive annotation cost and domain gap; Unsupervised methods integrate a feature clustering-based human parsing process into the ReID model, but lacking supervision signals brings less satisfactory segmentation results.
Ranked #9 on
Person Re-Identification
on Occluded-DukeMTMC
1 code implementation • 8 Dec 2022 • Cairong Zhao, Yubin Wang, Xinyang Jiang, Yifei Shen, Kaitao Song, Dongsheng Li, Duoqian Miao
Prompt learning is one of the most effective and trending ways to adapt powerful vision-language foundation models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples.
Ranked #5 on
Prompt Engineering
on Food-101
1 code implementation • 20 Nov 2022 • Wenli Sun, Xinyang Jiang, Shuguang Dou, Dongsheng Li, Duoqian Miao, Cheng Deng, Cairong Zhao
Instead of learning fixed triggers for the target classes from the training set, DT-IBA can dynamically generate new triggers for any unknown identities.
no code implementations • 23 Aug 2022 • Boshen Zhang, Yuxi Li, Yuanpeng Tu, Jinlong Peng, Yabiao Wang, Cunlin Wu, Yang Xiao, Cairong Zhao
Specifically, for the clean set, we deliberately design a memory-based modulation scheme to dynamically adjust the contribution of each sample in terms of its historical credibility sequence during training, thus alleviating the effect from noisy samples incorrectly grouped into the clean set.
no code implementations • 15 Jul 2022 • Shuguang Dou, Xinyang Jiang, Qingsong Zhao, Dongsheng Li, Cairong Zhao
In this paper, we aim to develop a technique that can achieve a good trade-off between privacy protection and data usability for person ReID.
1 code implementation • IEEE Transactions on Circuits and Systems for Video Technology 2022 • Cairong Zhao, Zhicheng Chen, Shuguang Dou, Zefan Qu, Jiawei Yao, Jun Wu, Duoqian Miao
For human-introduced noise, we propose a noise-discovery and noise-suppression training process for mislabeling robust person search.
no code implementations • 2 Mar 2022 • Qingsong Zhao, Yi Wang, Shuguang Dou, Chen Gong, Yin Wang, Cairong Zhao
Regarding this hypothesis, we propose a novel regularization to improve discriminative learning.
no code implementations • 21 Feb 2022 • Qingsong Zhao, Yi Wang, Zhipeng Zhou, Duoqian Miao, LiMin Wang, Yu Qiao, Cairong Zhao
Sequence ordering of word vector matters a lot to text reading, which has been proven in natural language processing (NLP).
2 code implementations • IEEE Transactions on Image Processing 2021 • Cairong Zhao, Yuanpeng Tu, Zhihui Lai, Fumin Shen, Heng Tao Shen, Duoqian Miao
Moreover, a novel iterative asymmetric mutual training strategy (IAMT) is proposed to alleviate drawbacks of common mutual learning, which can continuously refine the discriminative regions for SSB and extract regularized dark knowledge for two models as well.
1 code implementation • IEEE Transactions on Circuits and Systems for Video Technology 2021 • Shaowei Hou, Cairong Zhao, Zhicheng Chen, Jun Wu, Zhihua Wei, Duoqian Miao
Our method achieves comparable performance on two benchmarks, CUHK-SYSU and PRW, and achieves 91. 96% of mAP and 93. 34% of rank1 accuracy on CUHK-SYSU.
1 code implementation • IEEE Transactions on Image Processing 2021 • Cairong Zhao, Xinbi Lv, Shuguang Dou, Shanshan Zhang, Jun Wu, Liang Wang
The adversarial suppression branch, embedded with two occlusion suppression module, minimizes the generated occlusion’s response and strengthens attentive feature representation on human non-occluded body regions.
Ranked #27 on
Person Re-Identification
on Occluded-DukeMTMC
1 code implementation • IEEE Transactions on Multimedia 2020 • Cairong Zhao, Xinbi Lv, Zhang Zhang, WangMeng Zuo, Jun Wu, Duoqian Miao
The extraction of robust feature representations from pedestrian images through CNNs with a single deterministic pooling operation is problematic as the features in real pedestrian images are complex and diverse.