Shifting More Attention to Video Salient Object Detection

The last decade has witnessed a growing interest in video salient object detection (VSOD). However, the research community long-term lacked a well-established VSOD dataset representative of real dynamic scenes with high-quality annotations. To address this issue, we elaborately collected a visual-attention-consistent Densely Annotated VSOD (DAVSOD) dataset, which contains 226 videos with 23,938 frames that cover diverse realistic-scenes, objects, instances and motions. With corresponding real human eye-fixation data, we obtain precise ground-truths. This is the first work that explicitly emphasizes the challenge of saliency shift, i.e., the video salient object(s) may dynamically change. To further contribute the community a complete benchmark, we systematically assess 17 representative VSOD algorithms over seven existing VSOD datasets and our DAVSOD with totally 84K frames (largest-scale). Utilizing three famous metrics, we then present a comprehensive and insightful performance analysis. Furthermore, we propose a baseline model. It is equipped with a saliency shift- aware convLSTM, which can efficiently capture video saliency dynamics through learning human attention-shift behavior. Extensive experiments open up promising future directions for model development and comparison.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Video Salient Object Detection DAVIS-2016 SSAV S-Measure 0.893 # 2
MAX E-MEASURE 0.948 # 3
MAX F-MEASURE 0.861 # 3
AVERAGE MAE 0.028 # 7
Video Salient Object Detection DAVSOD-Difficult20 SSAV S-Measure 0.619 # 1
max E-measure 0.696 # 3
Average MAE 0.114 # 2
Video Salient Object Detection DAVSOD-easy35 SSAV S-Measure 0.755 # 1
max F-Measure 0.659 # 1
max E-Measure 0.806 # 1
Average MAE 0.084 # 1
Video Salient Object Detection DAVSOD-Normal25 SSAV S-Measure 0.661 # 1
max E-measure 0.723 # 1
Average MAE 0.117 # 1
Video Salient Object Detection FBMS-59 SSAV S-Measure 0.879 # 2
AVERAGE MAE 0.040 # 2
MAX E-MEASURE 0.926 # 1
MAX F-MEASURE 0.865 # 2
Video Salient Object Detection MCL SSAV S-Measure 0.819 # 2
MAX E-MEASURE 0.889 # 2
MAX F-MEASURE 0.773 # 1
AVERAGE MAE 0.026 # 7
Video Salient Object Detection SegTrack v2 SSAV S-Measure 0.850 # 3
MAX F-MEASURE 0.801 # 2
AVERAGE MAE 0.023 # 2
max E-measure 0.917 # 2
Video Salient Object Detection UVSD SSAV S-Measure 0.860 # 2
max E-measure 0.939 # 2
Average MAE 0.025 # 2
Video Salient Object Detection ViSal SSAV S-Measure 0.942 # 2
max E-measure 0.980 # 2
Average MAE 0.021 # 2
Video Salient Object Detection VOS-T SSAV S-Measure 0.819 # 2
max E-measure 0.839 # 2
Average MAE 0.074 # 2

Methods


No methods listed for this paper. Add relevant methods here