TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Dense Video Captioning	ActivityNet Captions	PDVC (TSP features, no SCST)	METEOR	9.03	# 6
Dense Video Captioning	ActivityNet Captions	PDVC (TSP features, no SCST)	BLEU-4	2.17	# 2
Dense Video Captioning	ActivityNet Captions	PDVC (TSP features, no SCST)	CIDEr	31.14	# 3
Dense Video Captioning	ActivityNet Captions	PDVC (TSP features, no SCST)	SODA	6.05	# 3
Dense Video Captioning	YouCook2	PDVC (TSN features, no SCST)	METEOR	4.74	# 5
Dense Video Captioning	YouCook2	PDVC (TSN features, no SCST)	CIDEr	22.71	# 5
Dense Video Captioning	YouCook2	PDVC (TSN features, no SCST)	BLEU4	0.8	# 2
Dense Video Captioning	YouCook2	PDVC (TSN features, no SCST)	SODA	4.42	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/end-to-end-dense-video-captioning-with/dense-video-captioning-on-youcook2)](https://paperswithcode.com/sota/dense-video-captioning-on-youcook2?p=end-to-end-dense-video-captioning-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/end-to-end-dense-video-captioning-with/dense-video-captioning-on-activitynet)](https://paperswithcode.com/sota/dense-video-captioning-on-activitynet?p=end-to-end-dense-video-captioning-with)`

End-to-End Dense Video Captioning with Parallel Decoding

ICCV 2021 · Teng Wang, Ruimao Zhang, Zhichao Lu, Feng Zheng, Ran Cheng, Ping Luo ·

Dense video captioning aims to generate multiple associated captions with their temporal locations from the video. Previous methods follow a sophisticated "localize-then-describe" scheme, which heavily relies on numerous hand-crafted components. In this paper, we proposed a simple yet effective framework for end-to-end dense video captioning with parallel decoding (PDVC), by formulating the dense caption generation as a set prediction task. In practice, through stacking a newly proposed event counter on the top of a transformer decoder, the PDVC precisely segments the video into a number of event pieces under the holistic understanding of the video content, which effectively increases the coherence and readability of predicted captions. Compared with prior arts, the PDVC has several appealing advantages: (1) Without relying on heuristic non-maximum suppression or a recurrent event sequence selection network to remove redundancy, PDVC directly produces an event set with an appropriate size; (2) In contrast to adopting the two-stage scheme, we feed the enhanced representations of event queries into the localization head and caption head in parallel, making these two sub-tasks deeply interrelated and mutually promoted through the optimization; (3) Without bells and whistles, extensive experiments on ActivityNet Captions and YouCook2 show that PDVC is capable of producing high-quality captioning results, surpassing the state-of-the-art two-stage methods when its localization accuracy is on par with them. Code is available at https://github.com/ttengwang/PDVC.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Code

Add Remove Mark official

ttengwang/pdvc official

188

aim3-ruc/youmakeup_challenge2022

Tasks

Add Remove

Caption Generation

Dense Video Captioning

Video Captioning

Datasets

ActivityNet Captions

YouCook2

Results from the Paper

Edit

Ranked #5 on Dense Video Captioning on YouCook2

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Dense Video Captioning	ActivityNet Captions	PDVC (TSP features, no SCST)	METEOR	9.03	# 6	Compare
			BLEU-4	2.17	# 2	Compare
			CIDEr	31.14	# 3	Compare
			SODA	6.05	# 3	Compare
Dense Video Captioning	YouCook2	PDVC (TSN features, no SCST)	METEOR	4.74	# 5	Compare
			CIDEr	22.71	# 5	Compare
			BLEU4	0.8	# 2	Compare
			SODA	4.42	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

End-to-End Dense Video Captioning with Parallel Decoding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove