Video Summarization
68 papers with code • 5 benchmarks • 13 datasets
Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.
Source: Video Summarization Using Deep Neural Networks: A Survey
Image credit: iJRASET
Datasets
Latest papers with no code
Unsupervised Video Summarization
This paper introduces a new, unsupervised method for automatic video summarization using ideas from generative adversarial networks but eliminating the discriminator, having a simple loss function, and separating training of different parts of the model.
Dynamic Non-monotone Submodular Maximization
Through this reduction, we obtain the first dynamic algorithms to solve the non-monotone submodular maximization problem under the cardinality constraint $k$.
Video-CSR: Complex Video Digest Creation for Visual-Language Models
We present a novel task and human annotated dataset for evaluating the ability for visual-language models to generate captions and summaries for real-world video clips, which we call Video-CSR (Captioning, Summarization and Retrieval).
Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling
This paper proposes Video-Teller, a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment to significantly enhance the video-to-text generation task.
Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization
Video summarization remains a huge challenge in computer vision due to the size of the input videos to be summarized.
Saliency-based Video Summarization for Face Anti-spoofing
Inspired by the visual saliency theory, we present a video summarization method for face anti-spoofing detection that aims to enhance the performance and efficiency of deep learning models by leveraging visual saliency.
Self-Attention Based Generative Adversarial Networks For Unsupervised Video Summarization
Experimental results indicate that using a self-attention mechanism as the frame selection mechanism outperforms the state-of-the-art on SumMe and leads to comparable to state-of-the-art performance on TVSum and COGNIMUSE.
Causal Video Summarizer for Video Exploration
Multi-modal video summarization has a video input and a text-based query input.
Query-based Video Summarization with Pseudo Label Supervision
Existing datasets for manually labelled query-based video summarization are costly and thus small, limiting the performance of supervised deep video summarization models.
Key Frame Extraction with Attention Based Deep Neural Networks
Automatic keyframe detection from videos is an exercise in selecting scenes that can best summarize the content for long videos.