1 code implementation • 12 Dec 2024 • Yongliang Wu, Wenbo Zhu, Jiawang Cao, Yi Lu, Bozheng Li, Weiheng Chi, Zihan Qiu, Lirian Su, Haolin Zheng, Jay Wu, Xu Yang
The demand for producing short-form videos for sharing on social media platforms has experienced significant growth in recent times.
no code implementations • 22 Aug 2024 • Bozheng Li, Mushui Liu, Gaoang Wang, Yunlong Yu
In this paper, we propose a novel Temporal Sequence-Aware Model (TSAM) for few-shot action recognition (FSAR), which incorporates a sequential perceiver adapter into the pre-training framework, to integrate both the spatial information and the sequential temporal dynamics into the feature embeddings.
no code implementations • 22 Aug 2024 • Mushui Liu, Fangtai Wu, Bozheng Li, Ziqian Lu, Yunlong Yu, Xi Li
Few-shot learning (FSL) aims to recognize new concepts using a limited number of visual samples.
1 code implementation • 12 Aug 2024 • Mushui Liu, Bozheng Li, Yunlong Yu
In this paper, we propose OmniCLIP, a framework that adapts CLIP for video recognition by focusing on learning comprehensive features encompassing spatial, temporal, and dynamic spatial-temporal scales, which we refer to as omni-scale features.
no code implementations • 4 Jul 2024 • Mushui Liu, Bozheng Li, Yunlong Yu
Prompt tuning, which involves training a small set of parameters, effectively enhances the pre-trained Vision-Language Models (VLMs) to downstream tasks.
no code implementations • 25 Jun 2024 • Yongliang Wu, Bozheng Li, Jiawang Cao, Wenbo Zhu, Yi Lu, Weiheng Chi, Chuyun Xie, Haolin Zheng, Ziyue Su, Jay Wu, Xu Yang
The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information.