Video Narrative Grounding

1 papers with code • 0 benchmarks • 1 datasets

Video Narrative Grounding is the task of linking video narratives to specific video segments. The input is a video with a text description (the narrative) and the positions of certain nouns marked. For each marked noun, the method must output a segmentation mask for the object it refers to, in each video frame.

Source: Connecting Vision and Language with Video Localized Narratives

Most implemented papers

Connecting Vision and Language with Video Localized Narratives

google/video-localized-narratives CVPR 2023

We propose Video Localized Narratives, a new form of multimodal video annotations connecting vision and language.