Moment Retrieval

48 papers with code • 2 benchmarks • 5 datasets

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Benchmarks

Add a Result

These leaderboards are used to track progress in Moment Retrieval

Trend	Dataset	Best Model	Paper	Code	Compare
	QVHighlights	CG-DETR (w/ PT)			See all
	Charades-STA	InternVideo2-6B			See all

Libraries

Use these libraries to find Moment Retrieval models and implementations

tencentarc/umt

2 papers

178

Datasets

Subtasks

Zero-shot Moment Retrieval

Most implemented papers

Most implemented Social Latest No code

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

jayleicn/moment_detr • • 20 Jul 2021

Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w. r. t.

Paper
Code

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

linjieli222/HERO • • EMNLP 2020

We present HERO, a novel framework for large-scale video+language omni-representation learning.

Paper
Code

Finding Moments in Video Collections Using Natural Language

jayleicn/TVRetrieval • • 30 Jul 2019

We evaluate our approach on two recently proposed datasets for temporal localization of moments in video with natural language (DiDeMo and Charades-STA) extended to our video corpus moment retrieval setting.

Paper
Code

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

jayleicn/TVRetrieval • • ECCV 2020

The queries are also labeled with query types that indicate whether each of them is more related to video or subtitle or both, allowing for in-depth analysis of the dataset and the methods that built on top of it.

Paper
Code

Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding

wjun0830/cgdetr • • 15 Nov 2023

Dummy tokens conditioned by text query take portions of the attention weights, preventing irrelevant video clips from being represented by the text query.

Paper
Code

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

opengvlab/internvideo2 • 22 Mar 2024

We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.

Paper
Code

Weakly Supervised Video Moment Retrieval From Text Queries

niluthpol/weak_supervised_video_moment • • CVPR 2019

The weak nature of the supervision is because, during training, we only have access to the video-text pairs rather than the temporal extent of the video to which different text descriptions relate.

Paper
Code

Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos

ikuinen/CMIN • • 6 Jun 2019

Query-based moment retrieval aims to localize the most relevant moment in an untrimmed video according to the given natural language query.

Paper
Code

Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos

ikuinen/regularized_two-branch_proposal_network • • 19 Aug 2020

Thus, these methods fail to distinguish the target moment from plausible negative moments.

Paper
Code

VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval

dbstjswo505/SQuiDNet • • ECCV 2020

This paper explores methods for performing VMR in a weakly-supervised manner (wVMR): training is performed without temporal moment labels but only with the text query that describes a segment of the video.

Paper
Code

Moment Retrieval

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result