Moment Queries
12 papers with code • 1 benchmarks • 1 datasets
Libraries
Use these libraries to find Moment Queries models and implementationsMost implemented papers
Egocentric Video-Language Pretraining
Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention.
Where a Strong Backbone Meets Strong Features -- ActionFormer for Ego4D Moment Queries Challenge
This report describes our submission to the Ego4D Moment Queries Challenge 2022.
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
In this report, we present our champion solutions to five tracks at Ego4D challenge.
ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022
Moreover, in order to better capture the long-term temporal dependencies in the long videos, we propose a segment-level recurrence mechanism.
Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023
This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023, including Natural Language Queries and Moment Queries.
NMS Threshold matters for Ego4D Moment Queries -- 2nd place solution to the Ego4D Moment Queries Challenge 2023
This report describes our submission to the Ego4D Moment Queries Challenge 2023.
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Video-language pre-training (VLP) has become increasingly important due to its ability to generalize to various vision and language tasks.
Knowing Where to Focus: Event-aware Transformer for Video Grounding
Recent DETR-based video grounding models have made the model directly predict moment timestamps without any hand-crafted components, such as a pre-defined proposal or non-maximum suppression, by learning moment queries.
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while Moment Retrieval (MR) aims to identify the events described by open-ended natural language within untrimmed videos.
Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding
To tackle this limitation, we present a Region-Guided TRansformer (RGTR) for temporal sentence grounding, which diversifies moment queries to eliminate overlapped and redundant predictions.