TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Moment Retrieval	Charades-STA	Moment-DETR w/ PT (on 10K HowTo100M videos)	R@1 IoU=0.5	55.65	# 12
Moment Retrieval	Charades-STA	Moment-DETR w/ PT (on 10K HowTo100M videos)	R@1 IoU=0.7	34.17	# 11
Moment Retrieval	Charades-STA	Moment-DETR	R@1 IoU=0.5	53.63	# 13
Moment Retrieval	Charades-STA	Moment-DETR	R@1 IoU=0.7	31.37	# 13
Highlight Detection	QVHighlights	Moment-DETR w/ PT	mAP	37.43	# 12
Highlight Detection	QVHighlights	Moment-DETR w/ PT	Hit@1	60.17	# 11
Moment Retrieval	QVHighlights	Moment-DETR (w/ PT ASR Cpations)	mAP	36.14	# 17
Moment Retrieval	QVHighlights	Moment-DETR (w/ PT ASR Cpations)	R@1 IoU=0.5	59.78	# 18
Moment Retrieval	QVHighlights	Moment-DETR (w/ PT ASR Cpations)	R@1 IoU=0.7	40.33	# 20
Moment Retrieval	QVHighlights	Moment-DETR (w/ PT ASR Cpations)	mAP@0.5	60.51	# 16
Moment Retrieval	QVHighlights	Moment-DETR (w/ PT ASR Cpations)	mAP@0.75	35.36	# 17

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/qvhighlights-detecting-moments-and-highlights/moment-retrieval-on-charades-sta)](https://paperswithcode.com/sota/moment-retrieval-on-charades-sta?p=qvhighlights-detecting-moments-and-highlights)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/qvhighlights-detecting-moments-and-highlights/highlight-detection-on-qvhighlights)](https://paperswithcode.com/sota/highlight-detection-on-qvhighlights?p=qvhighlights-detecting-moments-and-highlights)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/qvhighlights-detecting-moments-and-highlights/moment-retrieval-on-qvhighlights)](https://paperswithcode.com/sota/moment-retrieval-on-qvhighlights?p=qvhighlights-detecting-moments-and-highlights)`

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

20 Jul 2021 · Jie Lei, Tamara L. Berg, Mohit Bansal ·

Detecting customized moments and highlights from videos given natural language (NL) user queries is an important but under-studied topic. One of the challenges in pursuing this direction is the lack of annotated data. To address this issue, we present the Query-based Video Highlights (QVHIGHLIGHTS) dataset. It consists of over 10,000 YouTube videos, covering a wide range of topics, from everyday activities and travel in lifestyle vlog videos to social and political activities in news videos. Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w.r.t. the query, and (3) five-point scale saliency scores for all query-relevant clips. This comprehensive annotation enables us to develop and evaluate systems that detect relevant moments as well as salient highlights for diverse, flexible user queries. We also present a strong baseline for this task, Moment-DETR, a transformer encoder-decoder model that views moment retrieval as a direct set prediction problem, taking extracted video and query representations as inputs and predicting moment coordinates and saliency scores end-to-end. While our model does not utilize any human prior, we show that it performs competitively when compared to well-engineered architectures. With weakly supervised pretraining using ASR captions, MomentDETR substantially outperforms previous methods. Lastly, we present several ablations and visualizations of Moment-DETR. Data and code is publicly available at https://github.com/jayleicn/moment_detr

PDF Abstract

Code

Add Remove Mark official

jayleicn/moment_detr official

232

tencentarc/umt

177

houzhijian/cone

yeliudev/R2-Tuning

Tasks

Add Remove

Highlight Detection

Moment Retrieval

Natural Language Queries

Retrieval

Datasets

Introduced in the Paper:

QVHighlights

Used in the Paper:

HowTo100M

Charades-STA

Results from the Paper

Edit

Ranked #12 on Highlight Detection on QVHighlights

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Moment Retrieval	Charades-STA	Moment-DETR w/ PT (on 10K HowTo100M videos)	R@1 IoU=0.5	55.65	# 12	Compare
Moment Retrieval	Charades-STA	Moment-DETR w/ PT (on 10K HowTo100M videos)	R@1 IoU=0.7	34.17	# 11	Compare
Moment Retrieval	Charades-STA	Moment-DETR	R@1 IoU=0.5	53.63	# 13	Compare
Moment Retrieval	Charades-STA	Moment-DETR	R@1 IoU=0.7	31.37	# 13	Compare
Highlight Detection	QVHighlights	Moment-DETR w/ PT	mAP	37.43	# 12	Compare
Highlight Detection	QVHighlights	Moment-DETR w/ PT	Hit@1	60.17	# 11	Compare
Moment Retrieval	QVHighlights	Moment-DETR (w/ PT ASR Cpations)	mAP	36.14	# 17	Compare
			R@1 IoU=0.5	59.78	# 18	Compare
			R@1 IoU=0.7	40.33	# 20	Compare
			mAP@0.5	60.51	# 16	Compare
			mAP@0.75	35.36	# 17	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove