Relational Reasoning Over Spatial-Temporal Graphs for Video Summarization

In this paper, we propose a dynamic graph modeling approach to learn spatial-temporal representations for video summarization. Most existing video summarization methods extract image-level features with ImageNet pre-trained deep models. Differently, our method exploits object-level and relation-level information to capture spatial-temporal dependencies. Specifically, our method builds spatial graphs on the detected object proposals. Then, we construct a temporal graph by using the aggregated representations of spatial graphs. Afterward, we perform relational reasoning over spatial and temporal graphs with graph convolutional networks and extract spatial-temporal representations for importance score prediction and key shot selection. To eliminate relation clutters caused by densely connected nodes, we further design a self-attention edge pooling module, which disregards meaningless relations of graphs. We conduct extensive experiments on two popular benchmarks, including the SumMe and TVSum datasets. Experimental results demonstrate that the proposed method achieves superior performance against state-of-the-art video summarization methods.

PDF Abstract

Results from the Paper


 Ranked #1 on Video Summarization on TvSum (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Graph Classification NCI1 SAEPool_g Accuracy 74.48% # 41
Graph Classification NCI109 SAEPool_h Accuracy 75.85 # 16
Graph Classification PROTEINS SAEPool Accuracy 80.36% # 7
Video Summarization SumMe RR-STG F1-score (Canonical) 54.5 # 2
F1-score (Augmented) 55.3 # 1
Supervised Video Summarization SumMe RR-STG F1-score (Canonical) 53.4 # 4
F1-score (Augmented) 54.8 # 1
Video Summarization TvSum RR-STG F1-score (Canonical) 63.0 # 1
F1-score (Augmented) 63.6 # 2
Supervised Video Summarization TvSum RR-STG F1-score (Canonical) 63.0 # 5
F1-score (Augmented) 63.6 # 2

Methods


No methods listed for this paper. Add relevant methods here