Search Results for author: Zihan Song

Found 3 papers, 1 papers with code

Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos

no code implementations • 28 Dec 2023 • Houlun Chen, Xin Wang, Hong Chen, Zihan Song, Jia Jia, Wenwu Zhu

To tackle these challenges, in this work we propose a Grounding-Prompter method, which is capable of conducting TSG in long videos through prompting LLM with multimodal information.

Denoising In-Context Learning +3

Paper
Add Code

LLM4VG: Large Language Models Evaluation for Video Grounding

no code implementations • 21 Dec 2023 • Wei Feng, Xin Wang, Hong Chen, Zeyang Zhang, Zihan Song, Yuwei Zhou, Wenwu Zhu

Recently, researchers have attempted to investigate the capability of LLMs in handling videos and proposed several video LLM models.

Image Captioning Video Grounding +1

Paper
Add Code

VTimeLLM: Empower LLM to Grasp Video Moments

1 code implementation • 30 Nov 2023 • Bin Huang, Xin Wang, Hong Chen, Zihan Song, Wenwu Zhu

Large language models (LLMs) have shown remarkable text understanding capabilities, which have been extended as Video LLMs to handle video data for comprehending visual details.

Ranked #2 on Video-based Generative Performance Benchmarking (Detail Orientation)) on VideoInstruct

Dense Video Captioning Video-based Generative Performance Benchmarking (Consistency) +5

128

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.