Video Summarization
68 papers with code • 5 benchmarks • 13 datasets
Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.
Source: Video Summarization Using Deep Neural Networks: A Survey
Image credit: iJRASET
Datasets
Latest papers
VideoSAGE: Video Summarization with Graph Representation Learning
We propose a graph-based representation learning framework for video summarization.
Enhancing Video Summarization with Context Awareness
Despite the importance of video summarization, there is a lack of diverse and representative datasets, hindering comprehensive evaluation and benchmarking of algorithms.
Cluster-based Video Summarization with Temporal Context Awareness
In this paper, we present TAC-SUM, a novel and efficient training-free approach for video summarization that addresses the limitations of existing cluster-based models by incorporating temporal context.
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries.
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries.
ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video
The Internet's wealth of content, with up to 60% published in English, starkly contrasts the global population, where only 18. 8% are English speakers, and just 5. 1% consider it their native language, leading to disparities in online information access.
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos
A human need to capture both the event in every shot and associate them together to understand the story behind it.
An Integrated System for Spatio-Temporal Summarization of 360-degrees Videos
In this work, we present an integrated system for spatiotemporal summarization of 360-degrees videos.
A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video
This paper proposes a practical multimodal video summarization task setting and a dataset to train and evaluate the task.
Adopting Self-Supervised Learning into Unsupervised Video Summarization through Restorative Score
We show that the reconstruction loss of the model for a video with masked frames correlates with the representativeness of the remaining frames in the video.