Moment Retrieval

33 papers with code • 2 benchmarks • 5 datasets

Moment retrieval can de defined as the task of "localizing moments in a video given a user query".

Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Libraries

Use these libraries to find Moment Retrieval models and implementations
2 papers
148

Most implemented papers

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

linjieli222/HERO EMNLP 2020

We present HERO, a novel framework for large-scale video+language omni-representation learning.

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

jayleicn/moment_detr 20 Jul 2021

Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w. r. t.

Finding Moments in Video Collections Using Natural Language

jayleicn/TVRetrieval 30 Jul 2019

We evaluate our approach on two recently proposed datasets for temporal localization of moments in video with natural language (DiDeMo and Charades-STA) extended to our video corpus moment retrieval setting.

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval

jayleicn/TVRetrieval ECCV 2020

The queries are also labeled with query types that indicate whether each of them is more related to video or subtitle or both, allowing for in-depth analysis of the dataset and the methods that built on top of it.

Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding

wjun0830/cgdetr 15 Nov 2023

Dummy tokens conditioned by text query take a portion of the attention weights, preventing irrelevant video clips from being represented by the text query.

Weakly Supervised Video Moment Retrieval From Text Queries

niluthpol/weak_supervised_video_moment CVPR 2019

The weak nature of the supervision is because, during training, we only have access to the video-text pairs rather than the temporal extent of the video to which different text descriptions relate.

Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos

ikuinen/CMIN 6 Jun 2019

Query-based moment retrieval aims to localize the most relevant moment in an untrimmed video according to the given natural language query.

Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos

ikuinen/regularized_two-branch_proposal_network 19 Aug 2020

Thus, these methods fail to distinguish the target moment from plausible negative moments.

Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

mayu-ot/hidden-challenges-MR 1 Sep 2020

In this paper, we present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task.

Frame-wise Cross-modal Matching for Video Moment Retrieval

tanghaoyu258/ACRM-for-moment-retrieval 22 Sep 2020

Another contribution is that we propose an additional predictor to utilize the internal frames in the model training to improve the localization accuracy.