no code implementations • 8 Mar 2025 • Yubin Wang, Xinyang Jiang, De Cheng, Xiangqian Zhao, Zilong Wang, Dongsheng Li, Cairong Zhao
Visual prompt tuning offers significant advantages for adapting pre-trained visual foundation models to specific tasks.
no code implementations • 8 Feb 2025 • Qirui Wu, Shizhou Zhang, De Cheng, Yinghui Xing, Di Xu, Peng Wang, Yanning Zhang
Catastrophic forgetting is a critical chanllenge for incremental object detection (IOD).
no code implementations • 23 Nov 2024 • De Cheng, Yue Lu, Lingfeng He, Shizhou Zhang, Xi Yang, Nannan Wang, Xinbo Gao
Continual Learning (CL) aims to equip AI models with the ability to learn a sequence of tasks over time, without forgetting previously learned knowledge.
1 code implementation • 16 Nov 2024 • Ying Yang, De Cheng, Chaowei Fang, Yubiao Wang, Changzhe Jiao, Lechao Cheng, Nannan Wang
The innovation of our approach is that we leverage the diffusion model's intrinsic data reconstruction ability to distinguish ID samples from OOD samples in the latent feature space.
Out-of-Distribution Detection
Out of Distribution (OOD) Detection
2 code implementations • 27 Aug 2024 • Yubin Wang, Xinyang Jiang, De Cheng, Wenli Sun, Dongsheng Li, Cairong Zhao
Specifically, we introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning.
Ranked #1 on
Prompt Engineering
on ImageNet V2
2 code implementations • 14 Aug 2024 • Shizhou Zhang, Wenlong Luo, De Cheng, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang
In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185, 907 images and 5, 576 tracklets, featuring 2, 788 distinct identities.
no code implementations • 13 Aug 2024 • Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li, Cairong Zhao
Video temporal grounding is an emerging topic aiming to identify specific clips within videos.
no code implementations • 13 Jun 2024 • Dingwen Zhang, Yan Li, De Cheng, Nannan Wang, Junwei Han
Based on an empirical study on the knowledge intensity of the kernel elements of the neural network, we find that the center kernel is the key for maximizing the knowledge intensity for learning new data, while freezing the other kernel elements would get a good balance on the model's capacity for overcoming catastrophic forgetting.
1 code implementation • 9 Jun 2024 • Yue Lu, Shizhou Zhang, De Cheng, Yinghui Xing, Nannan Wang, Peng Wang, Yanning Zhang
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL), by selecting and updating relevant prompts in the vision-transformer models.
1 code implementation • 12 Mar 2024 • De Cheng, Yanling Ji, Dong Gong, Yan Li, Nannan Wang, Junwei Han, Dingwen Zhang
It considers the characteristics of the image restoration task with multiple degenerations in continual learning, and the knowledge for different degenerations can be shared and accumulated in the unified network structure.
1 code implementation • 1 Feb 2024 • Lingfeng He, De Cheng, Nannan Wang, Xinbo Gao
While prior work focuses on establishing cross-modality pseudo-label associations to bridge the modality-gap, they ignore maintaining the instance-level homogeneous and heterogeneous consistency between the feature space and the pseudo-label space, resulting in coarse associations.
no code implementations • CVPR 2024 • De Cheng, Zhipeng Xu, Xinyang Jiang, Nannan Wang, Dongsheng Li, Xinbo Gao
Although there is a growing focus on VFM-based domain prompt tuning for DG effectively learning prompts that disentangle invariant features across all domains remains a major challenge.
2 code implementations • 11 Dec 2023 • Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li, Cairong Zhao
To address this limitation and prioritize harnessing structured knowledge, this paper advocates for leveraging LLMs to build a graph for each description to model the entities and attributes describing the category, as well as their correlations.
Ranked #2 on
Prompt Engineering
on ImageNet V2
no code implementations • 5 Dec 2023 • Guozhang Li, Xinpeng Ding, De Cheng, Jie Li, Nannan Wang, Xinbo Gao
To further clarify the noise of expanded boundaries, we combine mutual learning with a tailored proposal-level contrastive objective to use a learnable approach to harmonize a balance between incomplete yet clean (initial) and comprehensive yet noisy (expanded) boundaries for more precise ones.
1 code implementation • 24 Aug 2023 • Shizhou Zhang, Qingchun Yang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, Yanning Zhang
In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31, 770 images of 260, 559 annotated bounding boxes for 2, 644 identities appearing in both of the UAVs and ground surveillance cameras.
no code implementations • 22 May 2023 • De Cheng, Lingfeng He, Nannan Wang, Shizhou Zhang, Zhen Wang, Xinbo Gao
To this end, we propose a novel bilateral cluster matching-based learning framework to reduce the modality gap by matching cross-modality clusters.
no code implementations • 22 May 2023 • De Cheng, Xiaojian Huang, Nannan Wang, Lingfeng He, Zhihui Li, Xinbo Gao
Unsupervised learning visible-infrared person re-identification (USL-VI-ReID) aims at learning modality-invariant features from unlabeled cross-modality dataset, which is crucial for practical applications in video surveillance systems.
1 code implementation • CVPR 2023 • Guozhang Li, De Cheng, Xinpeng Ding, Nannan Wang, Xiaoyu Wang, Xinbo Gao
For the discriminative objective, we propose a Text-Segment Mining (TSM) mechanism, which constructs a text description based on the action class label, and regards the text as the query to mine all class-related segments.
1 code implementation • 25 Apr 2023 • Guozhang Li, De Cheng, Xinpeng Ding, Nannan Wang, Jie Li, Xinbo Gao
The proposed Bi-SCC firstly adopts a temporal context augmentation to generate an augmented video that breaks the correlation between positive actions and their co-scene actions in the inter-video; Then, a semantic consistency constraint (SCC) is used to enforce the predictions of the original video and augmented video to be consistent, hence suppressing the co-scene actions.
Weakly-supervised Temporal Action Localization
Weakly Supervised Temporal Action Localization
no code implementations • 30 Nov 2022 • De Cheng, Haichun Tai, Nannan Wang, Zhen Wang, Xinbo Gao
In this paper, we propose a Neighbour Consistency guided Pseudo Label Refinement (NCPLR) framework, which can be regarded as a transductive form of label propagation under the assumption that the prediction of each example should be similar to its nearest neighbours'.
1 code implementation • NIPS 2022 • De Cheng, Yixiong Ning, Nannan Wang, Xinbo Gao, Heng Yang, Yuxuan Du, Bo Han, Tongliang Liu
We show that the cycle-consistency regularization helps to minimize the volume of the transition matrix T indirectly without exploiting the estimated noisy class posterior, which could further encourage the estimated transition matrix T to converge to its optimal solution.
1 code implementation • 17 Aug 2022 • Yinghui Xing, Qirui Wu, De Cheng, Shizhou Zhang, Guoqiang Liang, Peng Wang, Yanning Zhang
To make the final image feature concentrate more on the target visual concept, a Class-Aware Visual Prompt Tuning (CAVPT) scheme is further proposed in our DPT, where the class-aware visual prompt is generated dynamically by performing the cross attention between text prompts features and image patch token embeddings to encode both the downstream task-related information and visual instance information.
no code implementations • CVPR 2022 • De Cheng, Tongliang Liu, Yixiong Ning, Nannan Wang, Bo Han, Gang Niu, Xinbo Gao, Masashi Sugiyama
In label-noise learning, estimating the transition matrix has attracted more and more attention as the matrix plays an important role in building statistically consistent classifiers.
no code implementations • 29 Mar 2022 • De Cheng, Gerong Wang, Bo wang, Qiang Zhang, Jungong Han, Dingwen Zhang
This design makes the presented transformer model a hybrid of 1) top-down and bottom-up attention pathways and 2) dynamic and static routing pathways.
no code implementations • 29 Mar 2022 • De Cheng, Yan Li, Dingwen Zhang, Nannan Wang, Xinbo Gao, Jiande Sun
To properly address this problem, we propose a novel density-variational learning framework to improve the robustness of the image dehzing model assisted by a variety of negative hazy images, to better deal with various complex hazy scenarios.
1 code implementation • CVPR 2022 • Peiliang Huang, Junwei Han, De Cheng, Dingwen Zhang
Zero-shot object detection aims at incorporating class semantic vectors to realize the detection of (both seen and) unseen classes given an unconstrained test image.
Ranked #2 on
Zero-Shot Object Detection
on PASCAL VOC'07
no code implementations • 29 Sep 2021 • De Cheng, Jingyu Zhou, Nannan Wang, Xinbo Gao
However, since person Re-Id is an open-set problem, the clustering based methods often leave out lots of outlier instances or group the instances into the wrong clusters, thus they can not make full use of the training samples as a whole.
1 code implementation • 27 Sep 2021 • Shizhou Zhang, De Cheng, Wenlong Luo, Yinghui Xing, Duo Long, Hao Li, Kai Niu, Guoqiang Liang, Yanning Zhang
Finding target persons in full scene images with a query of text description has important practical applications in intelligent video surveillance. However, different from the real-world scenarios where the bounding boxes are not available, existing text-based person retrieval methods mainly focus on the cross modal matching between the query text descriptions and the gallery of cropped pedestrian images.
1 code implementation • 22 Sep 2021 • Yan Li, De Cheng, Jiande Sun, Dingwen Zhang, Nannan Wang, Xinbo Gao
In this paper, we propose a single image dehazing method with an independent Detail Recovery Network (DRN), which considers capturing the details from the input image over a separate network and then integrates them into a coarse dehazed image.
no code implementations • ICCV 2021 • Xinpeng Ding, Nannan Wang, Shiwei Zhang, De Cheng, Xiaomeng Li, Ziyuan Huang, Mingqian Tang, Xinbo Gao
The contrastive objective aims to learn effective representations by contrastive learning, while the caption objective can train a powerful video encoder supervised by texts.
no code implementations • ICCV 2017 • Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, Alexander G. Hauptmann
relevant) to the given event class, we formulate this task as a multi-instance learning (MIL) problem by taking each video as a bag and the video shots in each video as instances.
no code implementations • 25 Jul 2017 • De Cheng, Yihong Gong, Zhihui Li, Weiwei Shi, Alexander G. Hauptmann, Nanning Zheng
The proposed method can take full advantages of the structured distance relationships among these training samples, with the constructed complete graph.
1 code implementation • CVPR 2016 • De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, Nanning Zheng
Person re-identification across cameras remains a very challenging problem, especially when there are no overlapping fields of view between cameras.