Natural Language Moment Retrieval

12 papers with code • 4 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Natural Language Moment Retrieval

Dataset	Best Model	Compare
ActivityNet Captions	GVL (paragraph-level)	See all
TACoS	BAM-DETR	See all
MAD	Zero-Shot CLIP + Guidance Model	See all
DiDeMo	VLG-Net	See all

Datasets

Latest papers

Most implemented Social Latest No code

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

yingsen1/unimd • • 7 Apr 2024

Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while Moment Retrieval (MR) aims to identify the events described by open-ended natural language within untrimmed videos.

07 Apr 2024

Paper
Code

BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos

Pilhyeon/BAM-DETR • • 30 Nov 2023

However, they suffer from the issue of center misalignment raised by the inherent ambiguity of moment centers, leading to inaccurate predictions.

30 Nov 2023

Paper
Code

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

easonxiao-888/uvcom • • 28 Nov 2023

Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted significant attention due to the growing demand for video analysis.

28 Nov 2023

Paper
Code

Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding

wjun0830/qd-detr • • 15 Nov 2023

Dummy tokens conditioned by text query take portions of the attention weights, preventing irrelevant video clips from being represented by the text query.

167

15 Nov 2023

Paper
Code

UnLoc: A Unified Framework for Video Localization Tasks

google-research/scenic • • ICCV 2023

While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task.

2,995

21 Aug 2023

Paper
Code

UniVTG: Towards Unified Video-Language Temporal Grounding

showlab/univtg • • ICCV 2023

Most methods in this direction develop taskspecific models that are trained with type-specific labels, such as moment retrieval (time interval) and highlight detection (worthiness curve), which limits their abilities to generalize to various VTG tasks and labels.

280

31 Jul 2023

Paper
Code

Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval

minjoong507/bm-detr • • 5 Jun 2023

Video moment retrieval (VMR) identifies a specific moment in an untrimmed video for a given natural language query.

05 Jun 2023

Paper
Code

Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

zjr2000/gvl • • 11 Mar 2023

Our framework is easily extensible to tasks covering visually-grounded language understanding and generation.

11 Mar 2023

Paper
Code

Localizing Moments in Long Video Via Multimodal Guidance

waybarrios/guidance-based-video-grounding • • ICCV 2023

In this paper, we propose a method for improving the performance of natural language grounding in long videos by identifying and pruning out non-describable windows.

26 Feb 2023

Paper
Code

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

Soldelli/MAD • • CVPR 2022

The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques.

134

01 Dec 2021

Paper
Code

Natural Language Moment Retrieval

Benchmarks Add a Result

Datasets

Latest papers

Content

Benchmarks

Add a Result