Video Summarization

68 papers with code • 5 benchmarks • 13 datasets

Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.

Source: Video Summarization Using Deep Neural Networks: A Survey
Image credit: iJRASET

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Summarization

Dataset	Best Model	Compare
SumMe	PGL-SUM	See all
TvSum	RR-STG	See all
Query-Focused Video Summarization Dataset	EgoVLPv2	See all
Shot2Story20K	SUM-shot	See all
videoxum	VTSUM-BLIP	See all

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

Unsupervised Video Summarization

no code yet • 7 Nov 2023

This paper introduces a new, unsupervised method for automatic video summarization using ideas from generative adversarial networks but eliminating the discriminator, having a simple loss function, and separating training of different parts of the model.

Paper
Add Code

Dynamic Non-monotone Submodular Maximization

no code yet • NeurIPS 2023

Through this reduction, we obtain the first dynamic algorithms to solve the non-monotone submodular maximization problem under the cardinality constraint $k$.

Paper
Add Code

Video-CSR: Complex Video Digest Creation for Visual-Language Models

no code yet • 8 Oct 2023

We present a novel task and human annotated dataset for evaluating the ability for visual-language models to generate captions and summaries for real-world video clips, which we call Video-CSR (Captioning, Summarization and Retrieval).

Paper
Add Code

Video-Teller: Enhancing Cross-Modal Generation with Fusion and Decoupling

no code yet • 8 Oct 2023

This paper proposes Video-Teller, a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment to significantly enhance the video-to-text generation task.

Paper
Add Code

Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization

no code yet • 18 Sep 2023

Video summarization remains a huge challenge in computer vision due to the size of the input videos to be summarized.

Paper
Add Code

Saliency-based Video Summarization for Face Anti-spoofing

no code yet • 23 Aug 2023

Inspired by the visual saliency theory, we present a video summarization method for face anti-spoofing detection that aims to enhance the performance and efficiency of deep learning models by leveraging visual saliency.

Paper
Add Code

Self-Attention Based Generative Adversarial Networks For Unsupervised Video Summarization

no code yet • 16 Jul 2023

Experimental results indicate that using a self-attention mechanism as the frame selection mechanism outperforms the state-of-the-art on SumMe and leads to comparable to state-of-the-art performance on TVSum and COGNIMUSE.

Paper
Add Code

Causal Video Summarizer for Video Exploration

no code yet • 4 Jul 2023

Multi-modal video summarization has a video input and a text-based query input.

Paper
Add Code

Query-based Video Summarization with Pseudo Label Supervision

no code yet • 4 Jul 2023

Existing datasets for manually labelled query-based video summarization are costly and thus small, limiting the performance of supervised deep video summarization models.

Paper
Add Code

Key Frame Extraction with Attention Based Deep Neural Networks

no code yet • 21 Jun 2023

Automatic keyframe detection from videos is an exercise in selecting scenes that can best summarize the content for long videos.

Paper
Add Code

Video Summarization

Benchmarks Add a Result

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result