1 code implementation • Applied Sciences 2024 • Yifang Xu, Yunzhuo Sun, Zien Xie, Benxiang Zhai, Sidan Du
Video temporal grounding (VTG) aims to locate specific temporal segments from an untrimmed video based on a linguistic query.
Ranked #1 on Zero-shot Moment Retrieval on QVHighlights
no code implementations • 3 Mar 2024 • Yunzhuo Sun, Yifang Xu, Zien Xie, Yukun Shu, Sidan Du
First, MiniGPT-4 is employed to generate the detailed description of the video frame and rewrite the query statement, fed into the encoder as new features.