Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection

The ability to capture inter-frame dynamics has been critical to the development of video salient object detection (VSOD). While many works have achieved great success in this field, a deeper insight into its dynamic nature should be developed. In this work, we aim to answer the following questions: How can a model adjust itself to dynamic variations as well as perceive fine differences in the real-world environment; How are the temporal dynamics well introduced into spatial information over time? To this end, we propose a dynamic context-sensitive filtering network (DCFNet) equipped with a dynamic context-sensitive filtering module (DCFM) and an effective bidirectional dynamic fusion strategy. The proposed DCFM sheds new light on dynamic filter generation by extracting location-related affinities between consecutive frames. Our bidirectional dynamic fusion strategy encourages the interaction of spatial and temporal information in a dynamic manner. Experimental results demonstrate that our proposed method can achieve state-of-the-art performance on most VSOD datasets while ensuring a real-time speed of 28 fps. The source code is publicly available at https://github.com/OIPLab-DUT/DCFNet.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Video Polyp Segmentation SUN-SEG-Easy (Unseen) DCF S measure 0.523 # 12
mean E-measure 0.514 # 13
weighted F-measure 0.270 # 12
mean F-measure 0.312 # 12
Dice 0.325 # 12
Sensitivity 0.340 # 14
Video Polyp Segmentation SUN-SEG-Hard (Unseen) DCF S-Measure 0.514 # 12
mean E-measure 0.522 # 13
weighted F-measure 0.263 # 12
mean F-measure 0.303 # 12
Dice 0.317 # 12
Sensitivity 0.364 # 14

Methods


No methods listed for this paper. Add relevant methods here