Search Results for author: Bozheng Li

Found 8 papers, 2 papers with code

RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought

no code implementations4 Jun 2025 Yi Lu, Jiawang Cao, Yongliang Wu, Bozheng Li, Licheng Tang, Yangguang Ji, Chong Wu, Jay Wu, Wenbo Zhu

To bridge this gap, we introduce Reasoning Segmentation via Visual Prompting (RSVP), a novel framework that unifies multi-step multimodal reasoning with grounded visual understanding.

Multimodal Reasoning Reasoning Segmentation +4

VEU-Bench: Towards Comprehensive Understanding of Video Editing

no code implementations CVPR 2025 Bozheng Li, Yongliang Wu, Yi Lu, Jiashuo Yu, Licheng Tang, Jiawang Cao, Wenqing Zhu, Yuyang Sun, Jay Wu, Wenbo Zhu

We also demonstrate that incorporating VEU data significantly enhances the performance of Vid-LLMs on general video understanding benchmarks, with an average improvement of 8. 3% across nine reasoning tasks.

Video Editing Video Understanding

Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark

1 code implementation12 Dec 2024 Yongliang Wu, Wenbo Zhu, Jiawang Cao, Yi Lu, Bozheng Li, Weiheng Chi, Zihan Qiu, Lirian Su, Haolin Zheng, Jay Wu, Xu Yang

The demand for producing short-form videos for sharing on social media platforms has experienced significant growth in recent times.

Highlight Detection Video Summarization

Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition

no code implementations22 Aug 2024 Bozheng Li, Mushui Liu, Gaoang Wang, Yunlong Yu

In this paper, we propose a novel Temporal Sequence-Aware Model (TSAM) for few-shot action recognition (FSAR), which incorporates a sequential perceiver adapter into the pre-training framework, to integrate both the spatial information and the sequential temporal dynamics into the feature embeddings.

Decision Making Few-Shot action recognition +1

Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning

no code implementations22 Aug 2024 Mushui Liu, Fangtai Wu, Bozheng Li, Ziqian Lu, Yunlong Yu, Xi Li

Few-shot learning (FSL) aims to recognize new concepts using a limited number of visual samples.

Few-Shot Learning

OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning

1 code implementation12 Aug 2024 Mushui Liu, Bozheng Li, Yunlong Yu

In this paper, we propose OmniCLIP, a framework that adapts CLIP for video recognition by focusing on learning comprehensive features encompassing spatial, temporal, and dynamic spatial-temporal scales, which we refer to as omni-scale features.

Video Recognition Zero-Shot Learning

Fully Fine-tuned CLIP Models are Efficient Few-Shot Learners

no code implementations4 Jul 2024 Mushui Liu, Bozheng Li, Yunlong Yu

Prompt tuning, which involves training a small set of parameters, effectively enhances the pre-trained Vision-Language Models (VLMs) to downstream tasks.

Domain Generalization Few-Shot Learning +1

Zero-Shot Long-Form Video Understanding through Screenplay

no code implementations25 Jun 2024 Yongliang Wu, Bozheng Li, Jiawang Cao, Wenbo Zhu, Yi Lu, Weiheng Chi, Chuyun Xie, Haolin Zheng, Ziyue Su, Jay Wu, Xu Yang

The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information.

Form Question Answering +2

Cannot find the paper you are looking for? You can Submit a new open access paper.