Supervised Video Summarization

11 papers with code • 2 benchmarks • 3 datasets

Supervised video summarization rely on datasets with human-labeled ground-truth annotations (either in the form of video summaries, as in the case of the SumMe dataset, or in the form of frame-level importance scores, as in the case of the TVSum dataset), based on which they try to discover the underlying criterion for video frame/fragment selection and video summarization.

Source: Video Summarization Using Deep Neural Networks: A Survey

Most implemented papers

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward

KaiyangZhou/vsumm-reinforce 29 Dec 2017

Video summarization aims to facilitate large-scale video browsing by producing short, concise summaries that are diverse and representative of original videos.

Test-Time Training with Self-Supervision for Generalization under Distribution Shifts

yueatsprograms/ttt_cifar_release 29 Sep 2019

In this paper, we propose Test-Time Training, a general approach for improving the performance of predictive models when training and test data come from different distributions.

Supervised Video Summarization via Multiple Feature Sets with Parallel Attention

TIBHannover/MSVA 23 Apr 2021

The proposed architecture utilizes an attention mechanism before fusing motion features and features representing the (static) visual content, i. e., derived from an image classification model.

Video Joint Modelling Based on Hierarchical Transformer for Co-summarization

HopLee6/VJMHT-PyTorch 27 Dec 2021

Video summarization aims to automatically generate a summary (storyboard or video skim) of a video, which can facilitate large-scale video retrieval and browsing.

Progressive Video Summarization via Multimodal Self-supervised Learning

HopLee6/SSPVS-PyTorch 7 Jan 2022

Considering that the annotation of large-scale datasets is time-consuming, we propose a multimodal self-supervised learning framework to obtain semantic representations of videos, which benefits the video summarization task.

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

boheumd/A2Summ CVPR 2023

The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.

Discriminative Feature Learning for Unsupervised Video Summarization

wildoctopus/SADNet 24 Nov 2018

The proposed variance loss allows a network to predict output scores for each frame with high discrepancy which enables effective feature learning and significantly improves model performance.

DSNet: A Flexible Detect-to-Summarize Network for Video Summarization

li-plus/DSNet 1 Dec 2020

In this paper, we propose a Detect-to-Summarize network (DSNet) framework for supervised video summarization.

CLIP-It! Language-Guided Video Summarization

srpkdyy/CLIP-It NeurIPS 2021

A generic video summary is an abridged version of a video that conveys the whole story and features the most important scenes.