Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection
The ability to capture inter-frame dynamics has been critical to the development of video salient object detection (VSOD). While many works have achieved great success in this field, a deeper insight into its dynamic nature should be developed. In this work, we aim to answer the following questions: How can a model adjust itself to dynamic variations as well as perceive fine differences in the real-world environment; How are the temporal dynamics well introduced into spatial information over time? To this end, we propose a dynamic context-sensitive filtering network (DCFNet) equipped with a dynamic context-sensitive filtering module (DCFM) and an effective bidirectional dynamic fusion strategy. The proposed DCFM sheds new light on dynamic filter generation by extracting location-related affinities between consecutive frames. Our bidirectional dynamic fusion strategy encourages the interaction of spatial and temporal information in a dynamic manner. Experimental results demonstrate that our proposed method can achieve state-of-the-art performance on most VSOD datasets while ensuring a real-time speed of 28 fps. The source code is publicly available at https://github.com/OIPLab-DUT/DCFNet.
PDF AbstractCode
Results from the Paper
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Video Polyp Segmentation | SUN-SEG-Easy (Unseen) | DCF | S measure | 0.523 | # 14 | |
mean E-measure | 0.514 | # 15 | ||||
weighted F-measure | 0.270 | # 13 | ||||
mean F-measure | 0.312 | # 14 | ||||
Dice | 0.325 | # 15 | ||||
Sensitivity | 0.340 | # 16 | ||||
Video Polyp Segmentation | SUN-SEG-Hard (Unseen) | DCF | S-Measure | 0.514 | # 14 | |
mean E-measure | 0.522 | # 15 | ||||
weighted F-measure | 0.263 | # 13 | ||||
mean F-measure | 0.303 | # 14 | ||||
Dice | 0.317 | # 15 | ||||
Sensitivity | 0.364 | # 16 |