no code implementations • 11 Aug 2024 • Yuhan Zhu, Guozhen Zhang, Chen Xu, Haocheng Shen, Xiaoxin Chen, Gangshan Wu, LiMin Wang
Specifically, we propose Contrastive Prompt Learning (CPT) as the key task for self-supervision.
1 code implementation • 5 Jul 2024 • Yuhan Zhu, Yuyang Ji, Zhiyu Zhao, Gangshan Wu, LiMin Wang
Pre-trained vision-language models (VLMs) have shown impressive results in various visual classification tasks.
no code implementations • 19 Jun 2024 • Yuhan Zhu, Jian Wang, Bing Li, Xuxian Tang, Hao Li, Neng Zhang, Yuqi Zhao
Experiments conducted on the dataset collected from the benchmark show that MicroCERCL can accurately localize the root cause of microservice systems in such environments, significantly outperforming state-of-the-art approaches with an increase of at least 24. 1% in top-1 accuracy.
no code implementations • CVPR 2024 • Yuhan Zhu, Guozhen Zhang, Jing Tan, Gangshan Wu, LiMin Wang
To address this issue, we propose a new Dual-level query-based TAD framework, namely DualDETR, to detect actions from both instance-level and boundary-level.
Ranked #2 on Temporal Action Localization on MultiTHUMOS
2 code implementations • 2 Oct 2023 • Xinhao Li, Yuhan Zhu, LiMin Wang
In this paper, we present a new adaptation paradigm (ZeroI2V) to transfer the image transformers to video recognition tasks (i. e., introduce zero extra cost to the original models during inference).
Ranked #6 on Action Recognition on UCF101 (using extra training data)
no code implementations • 19 Aug 2023 • Chen Xu, Yuhan Zhu, Guozhen Zhang, Haocheng Shen, Yixuan Liao, Xiaoxin Chen, Gangshan Wu, LiMin Wang
Prompt learning has emerged as an efficient and effective approach for transferring foundational Vision-Language Models (e. g., CLIP) to downstream tasks.
1 code implementation • 17 Apr 2023 • Chen Xu, Yuhan Zhu, Haocheng Shen, Boheng Chen, Yixuan Liao, Xiaoxin Chen, LiMin Wang
To the best of our knowledge, we are the first to demonstrate the superior performance of visual prompts in V-L models to previous prompt-based methods in downstream tasks.
1 code implementation • CVPR 2023 • Guozhen Zhang, Yuhan Zhu, Haonan Wang, Youxin Chen, Gangshan Wu, LiMin Wang
In this paper, we propose a novel module to explicitly extract motion and appearance information via a unifying operation.
Ranked #1 on Video Frame Interpolation on UCF101