DSNet: A Flexible Detect-to-Summarize Network for Video Summarization

1 Dec 2020  ·  Wencheng Zhu, Jiwen Lu, Jiahao Li, and Jie Zhou ·

In this paper, we propose a Detect-to-Summarize network (DSNet) framework for supervised video summarization. Our DSNet contains anchor-based and anchor-free counterparts. The anchor-based method generates temporal interest proposals to determine and localize the representative contents of video sequences, while the anchor-free method eliminates the pre-defined temporal proposals and directly predicts the importance scores and segment locations. Different from existing supervised video summarization methods which formulate video summarization as a regression problem without temporal consistency and integrity constraints, our interest detection framework is the first attempt to leverage temporal consistency via the temporal interest detection formulation. Specifically, in the anchor-based approach, we first provide a dense sampling of temporal interest proposals with multi-scale intervals that accommodate interest variations in length, and then extract their long-range temporal features for interest proposal location regression and importance prediction. Notably, positive and negative segments are both assigned for the correctness and completeness information of the generated summaries. In the anchor-free approach, we alleviate drawbacks of temporal proposals by directly predicting importance scores of video frames and segment locations. Particularly, the interest detection framework can be flexibly plugged into off-the-shelf supervised video summarization methods. We evaluate the anchor-based and anchor-free approaches on the SumMe and TVSum datasets. Experimental results clearly validate the effectiveness of the anchor-based and anchor-free approaches.

PDF Abstract

Datasets


Results from the Paper


Ranked #2 on Video Summarization on TvSum (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Video Summarization SumMe DSNet F1-score (Canonical) 53.0 # 3
F1-score (Augmented) 53.3 # 2
Supervised Video Summarization SumMe DSNet F1-score (Canonical) 50.2 # 7
F1-score (Augmented) 50.7 # 2
Video Summarization TvSum DSNet F1-score (Canonical) 62.1 # 2
F1-score (Augmented) 63.9 # 1
Supervised Video Summarization TvSum DSNet F1-score (Canonical) 62.1 # 7
F1-score (Augmented) 63.9 # 1

Methods


No methods listed for this paper. Add relevant methods here