Supervised Video Summarization
11 papers with code • 2 benchmarks • 3 datasets
Supervised video summarization rely on datasets with human-labeled ground-truth annotations (either in the form of video summaries, as in the case of the SumMe dataset, or in the form of frame-level importance scores, as in the case of the TVSum dataset), based on which they try to discover the underlying criterion for video frame/fragment selection and video summarization.
Source: Video Summarization Using Deep Neural Networks: A Survey
Most implemented papers
Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward
Video summarization aims to facilitate large-scale video browsing by producing short, concise summaries that are diverse and representative of original videos.
Test-Time Training with Self-Supervision for Generalization under Distribution Shifts
In this paper, we propose Test-Time Training, a general approach for improving the performance of predictive models when training and test data come from different distributions.
Supervised Video Summarization via Multiple Feature Sets with Parallel Attention
The proposed architecture utilizes an attention mechanism before fusing motion features and features representing the (static) visual content, i. e., derived from an image classification model.
Video Joint Modelling Based on Hierarchical Transformer for Co-summarization
Video summarization aims to automatically generate a summary (storyboard or video skim) of a video, which can facilitate large-scale video retrieval and browsing.
Progressive Video Summarization via Multimodal Self-supervised Learning
Considering that the annotation of large-scale datasets is time-consuming, we propose a multimodal self-supervised learning framework to obtain semantic representations of videos, which benefits the video summarization task.
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.
Discriminative Feature Learning for Unsupervised Video Summarization
The proposed variance loss allows a network to predict output scores for each frame with high discrepancy which enables effective feature learning and significantly improves model performance.
DSNet: A Flexible Detect-to-Summarize Network for Video Summarization
In this paper, we propose a Detect-to-Summarize network (DSNet) framework for supervised video summarization.
CLIP-It! Language-Guided Video Summarization
A generic video summary is an abridged version of a video that conveys the whole story and features the most important scenes.
Combining Global and Local Attention with Positional Encoding for Video Summarization
This paper presents a new method for supervised video summarization.