TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Captioning	MSR-VTT	CoCap (ViT/L14)	CIDEr	57.2	# 18
Video Captioning	MSR-VTT	CoCap (ViT/L14)	METEOR	30.3	# 11
Video Captioning	MSR-VTT	CoCap (ViT/L14)	ROUGE-L	63.4	# 13
Video Captioning	MSR-VTT	CoCap (ViT/L14)	BLEU-4	44.4	# 16
Video Captioning	MSVD	CoCap (ViT/L14)	CIDEr	121.5	# 10
Video Captioning	MSVD	CoCap (ViT/L14)	BLEU-4	60.1	# 7
Video Captioning	MSVD	CoCap (ViT/L14)	METEOR	41.4	# 6
Video Captioning	MSVD	CoCap (ViT/L14)	ROUGE-L	78.2	# 6
Video Captioning	VATEX	CoCap (ViT/L14)	BLEU-4	35.8	# 8
Video Captioning	VATEX	CoCap (ViT/L14)	CIDEr	64.8	# 7
Video Captioning	VATEX	CoCap (ViT/L14)	METEOR	25.3	# 4
Video Captioning	VATEX	CoCap (ViT/L14)	ROUGE-L	52.0	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/accurate-and-fast-compressed-video-captioning/video-captioning-on-vatex-1)](https://paperswithcode.com/sota/video-captioning-on-vatex-1?p=accurate-and-fast-compressed-video-captioning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/accurate-and-fast-compressed-video-captioning/video-captioning-on-msvd-1)](https://paperswithcode.com/sota/video-captioning-on-msvd-1?p=accurate-and-fast-compressed-video-captioning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/accurate-and-fast-compressed-video-captioning/video-captioning-on-msr-vtt-1)](https://paperswithcode.com/sota/video-captioning-on-msr-vtt-1?p=accurate-and-fast-compressed-video-captioning)`

Accurate and Fast Compressed Video Captioning

ICCV 2023 · Yaojie Shen, Xin Gu, Kai Xu, Heng Fan, Longyin Wen, Libo Zhang ·

Existing video captioning approaches typically require to first sample video frames from a decoded video and then conduct a subsequent process (e.g., feature extraction and/or captioning model learning). In this pipeline, manual frame sampling may ignore key information in videos and thus degrade performance. Additionally, redundant information in the sampled frames may result in low efficiency in the inference of video captioning. Addressing this, we study video captioning from a different perspective in compressed domain, which brings multi-fold advantages over the existing pipeline: 1) Compared to raw images from the decoded video, the compressed video, consisting of I-frames, motion vectors and residuals, is highly distinguishable, which allows us to leverage the entire video for learning without manual sampling through a specialized model design; 2) The captioning model is more efficient in inference as smaller and less redundant information is processed. We propose a simple yet effective end-to-end transformer in the compressed domain for video captioning that enables learning from the compressed video for captioning. We show that even with a simple design, our method can achieve state-of-the-art performance on different benchmarks while running almost 2x faster than existing approaches. Code is available at https://github.com/acherstyx/CoCap.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

acherstyx/CoCap official

Tasks

Add Remove

Video Captioning

Datasets

MSR-VTT

MSVD

LAION-400M

VATEX

Results from the Paper

Edit

Ranked #8 on Video Captioning on VATEX

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Captioning	MSR-VTT	CoCap (ViT/L14)	CIDEr	57.2	# 18	Compare
			METEOR	30.3	# 11	Compare
			ROUGE-L	63.4	# 13	Compare
			BLEU-4	44.4	# 16	Compare
Video Captioning	MSVD	CoCap (ViT/L14)	CIDEr	121.5	# 10	Compare
			BLEU-4	60.1	# 7	Compare
			METEOR	41.4	# 6	Compare
			ROUGE-L	78.2	# 6	Compare
Video Captioning	VATEX	CoCap (ViT/L14)	BLEU-4	35.8	# 8	Compare
			CIDEr	64.8	# 7	Compare
			METEOR	25.3	# 4	Compare
			ROUGE-L	52.0	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Accurate and Fast Compressed Video Captioning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove