Supervised Video Summarization via Multiple Feature Sets with Parallel Attention

23 Apr 2021  ·  Junaid Ahmed Ghauri, Sherzod Hakimov, Ralph Ewerth ·

The assignment of importance scores to particular frames or (short) segments in a video is crucial for summarization, but also a difficult task. Previous work utilizes only one source of visual features. In this paper, we suggest a novel model architecture that combines three feature sets for visual content and motion to predict importance scores. The proposed architecture utilizes an attention mechanism before fusing motion features and features representing the (static) visual content, i.e., derived from an image classification model. Comprehensive experimental evaluations are reported for two well-known datasets, SumMe and TVSum. In this context, we identify methodological issues on how previous work used these benchmark datasets, and present a fair evaluation scheme with appropriate data splits that can be used in future work. When using static and motion features with parallel attention mechanism, we improve state-of-the-art results for SumMe, while being on par with the state of the art for the other dataset.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Supervised Video Summarization SumMe MSVA F1-score (Canonical) 53.4 # 3
Kendall's Tau 0.2 # 1
Spearman's Rho 0.23 # 1
Supervised Video Summarization SumMe MC-VSA [DBLP:journals/corr/abs-2006-01410] F1-score (Canonical) 51.6 # 5
Supervised Video Summarization SumMe VASNet [DBLP:conf/accv/FajtlSAMR18] F1-score (Canonical) 48 # 8
Kendall's Tau 0.16 # 2
Spearman's Rho 0.17 # 2
Supervised Video Summarization SumMe re-SEQ2SEQ [DBLP:conf/eccv/ZhangGS18] F1-score (Canonical) 44.9 # 9
Supervised Video Summarization SumMe M-AVS [DBLP:journals/corr/abs-1708-09545] F1-score (Canonical) 44.4 # 10
Supervised Video Summarization SumMe MAVS [DBLP:conf/mm/FengLKZ18] F1-score (Canonical) 43.1 # 11
Supervised Video Summarization TvSum M-AVS [DBLP:journals/corr/abs-1708-09545] F1-score (Canonical) 61 # 8
Supervised Video Summarization TvSum MSVA F1-score (Canonical) 61.5 # 7
Supervised Video Summarization TvSum re-SEQ2SEQ [DBLP:conf/eccv/ZhangGS18] F1-score (Canonical) 63.9 # 2
Supervised Video Summarization TvSum MC-VSA [DBLP:journals/corr/abs-2006-01410] F1-score (Canonical) 63.7 # 3
Supervised Video Summarization TvSum VASNet [DBLP:conf/accv/FajtlSAMR18] F1-score (Canonical) 59.8 # 10
Supervised Video Summarization TvSum MAVS [DBLP:conf/mm/FengLKZ18] F1-score (Canonical) 67.5 # 1

Methods


No methods listed for this paper. Add relevant methods here