TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Video Captioning	Hindi MSR-VTT	SBD_Keyframe	BLEU4	41.01	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-efficient-keyframes-selection-based/video-captioning-on-hindi-msr-vtt)](https://paperswithcode.com/sota/video-captioning-on-hindi-msr-vtt?p=an-efficient-keyframes-selection-based)`

An Efficient Keyframes Selection Based Framework for Video Captioning

ICON 2021 · Alok Singh, Loitongbam Sanayai Meetei, Salam Michael Singh, Thoudam Doren Singh, Sivaji Bandyopadhyay ·

Describing a video is a challenging yet attractive task since it falls into the intersection of computer vision and natural language generation. The attention-based models have reported the best performance. However, all these models follow similar procedures, such as segmenting videos into chunks of frames or sampling frames at equal intervals for visual encoding. The process of segmenting video into chunks or sampling frames at equal intervals causes encoding of redundant visual information and requires additional computational cost since a video consists of a sequence of similar frames and suffers from inescapable noise such as uneven illumination, occlusion and motion effects. In this paper, a boundary-based keyframes selection approach for video description is proposed that allow the system to select a compact subset of keyframes to encode the visual information and generate a description for a video without much degradation. The proposed approach uses 3 4 frames per video and yields competitive performance over two benchmark datasets MSVD and MSR-VTT (in both English and Hindi).

PDF Abstract