Natural Language Moment Retrieval

16 papers with code • 4 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding

wjun0830/cgdetr 15 Nov 2023

Dummy tokens conditioned by text query take portions of the attention weights, preventing irrelevant video clips from being represented by the text query.

Dense Regression Network for Video Grounding

alvin-zeng/drn CVPR 2020

The key idea of this paper is to use the distances between the frame within the ground truth and the starting (ending) frame as dense supervisions to improve the video grounding accuracy.

VLG-Net: Video-Language Graph Matching Network for Video Grounding

Soldelli/VLG-Net 19 Nov 2020

Grounding language queries in videos aims at identifying the time interval (or moment) semantically relevant to a language query.

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

Soldelli/MAD CVPR 2022

The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques.

Localizing Moments in Long Video Via Multimodal Guidance

waybarrios/guidance-based-video-grounding ICCV 2023

In this paper, we propose a method for improving the performance of natural language grounding in long videos by identifying and pruning out non-describable windows.

Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

zjr2000/gvl 11 Mar 2023

Our framework is easily extensible to tasks covering visually-grounded language understanding and generation.

Background-aware Moment Detection for Video Moment Retrieval

minjoong507/bm-detr 5 Jun 2023

Video moment retrieval (VMR) identifies a specific moment in an untrimmed video for a given natural language query.

UniVTG: Towards Unified Video-Language Temporal Grounding

showlab/univtg ICCV 2023

Most methods in this direction develop taskspecific models that are trained with type-specific labels, such as moment retrieval (time interval) and highlight detection (worthiness curve), which limits their abilities to generalize to various VTG tasks and labels.

UnLoc: A Unified Framework for Video Localization Tasks

google-research/scenic ICCV 2023

While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task.

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

easonxiao-888/uvcom CVPR 2024

Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted significant attention due to the growing demand for video analysis.