Search Results for author: Bozheng Li

Found 6 papers, 2 papers with code

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

1 code implementation12 Dec 2024 Yongliang Wu, Wenbo Zhu, Jiawang Cao, Yi Lu, Bozheng Li, Weiheng Chi, Zihan Qiu, Lirian Su, Haolin Zheng, Jay Wu, Xu Yang

The demand for producing short-form videos for sharing on social media platforms has experienced significant growth in recent times.

Highlight Detection Video Summarization

Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition

no code implementations22 Aug 2024 Bozheng Li, Mushui Liu, Gaoang Wang, Yunlong Yu

In this paper, we propose a novel Temporal Sequence-Aware Model (TSAM) for few-shot action recognition (FSAR), which incorporates a sequential perceiver adapter into the pre-training framework, to integrate both the spatial information and the sequential temporal dynamics into the feature embeddings.

Decision Making Few-Shot action recognition +1

Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning

no code implementations22 Aug 2024 Mushui Liu, Fangtai Wu, Bozheng Li, Ziqian Lu, Yunlong Yu, Xi Li

Few-shot learning (FSL) aims to recognize new concepts using a limited number of visual samples.

Few-Shot Learning

OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning

1 code implementation12 Aug 2024 Mushui Liu, Bozheng Li, Yunlong Yu

In this paper, we propose OmniCLIP, a framework that adapts CLIP for video recognition by focusing on learning comprehensive features encompassing spatial, temporal, and dynamic spatial-temporal scales, which we refer to as omni-scale features.

Video Recognition Zero-Shot Learning

Fully Fine-tuned CLIP Models are Efficient Few-Shot Learners

no code implementations4 Jul 2024 Mushui Liu, Bozheng Li, Yunlong Yu

Prompt tuning, which involves training a small set of parameters, effectively enhances the pre-trained Vision-Language Models (VLMs) to downstream tasks.

Domain Generalization Few-Shot Learning +1

Zero-Shot Long-Form Video Understanding through Screenplay

no code implementations25 Jun 2024 Yongliang Wu, Bozheng Li, Jiawang Cao, Wenbo Zhu, Yi Lu, Weiheng Chi, Chuyun Xie, Haolin Zheng, Ziyue Su, Jay Wu, Xu Yang

The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information.

Form Question Answering +2

Cannot find the paper you are looking for? You can Submit a new open access paper.