Moment Retrieval
33 papers with code • 2 benchmarks • 5 datasets
Moment retrieval can de defined as the task of "localizing moments in a video given a user query".
Description from: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries
Image credit: QVHIGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries
Libraries
Use these libraries to find Moment Retrieval models and implementationsMost implemented papers
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
We present HERO, a novel framework for large-scale video+language omni-representation learning.
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w. r. t.
Finding Moments in Video Collections Using Natural Language
We evaluate our approach on two recently proposed datasets for temporal localization of moments in video with natural language (DiDeMo and Charades-STA) extended to our video corpus moment retrieval setting.
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
The queries are also labeled with query types that indicate whether each of them is more related to video or subtitle or both, allowing for in-depth analysis of the dataset and the methods that built on top of it.
Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding
Dummy tokens conditioned by text query take a portion of the attention weights, preventing irrelevant video clips from being represented by the text query.
Weakly Supervised Video Moment Retrieval From Text Queries
The weak nature of the supervision is because, during training, we only have access to the video-text pairs rather than the temporal extent of the video to which different text descriptions relate.
Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos
Query-based moment retrieval aims to localize the most relevant moment in an untrimmed video according to the given natural language query.
Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos
Thus, these methods fail to distinguish the target moment from plausible negative moments.
Uncovering Hidden Challenges in Query-Based Video Moment Retrieval
In this paper, we present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task.
Frame-wise Cross-modal Matching for Video Moment Retrieval
Another contribution is that we propose an additional predictor to utilize the internal frames in the model training to improve the localization accuracy.