Shifting More Attention to Video Salient Object Detection
The last decade has witnessed a growing interest in video salient object detection (VSOD). However, the research community long-term lacked a well-established VSOD dataset representative of real dynamic scenes with high-quality annotations. To address this issue, we elaborately collected a visual-attention-consistent Densely Annotated VSOD (DAVSOD) dataset, which contains 226 videos with 23,938 frames that cover diverse realistic-scenes, objects, instances and motions. With corresponding real human eye-fixation data, we obtain precise ground-truths. This is the first work that explicitly emphasizes the challenge of saliency shift, i.e., the video salient object(s) may dynamically change. To further contribute the community a complete benchmark, we systematically assess 17 representative VSOD algorithms over seven existing VSOD datasets and our DAVSOD with totally 84K frames (largest-scale). Utilizing three famous metrics, we then present a comprehensive and insightful performance analysis. Furthermore, we propose a baseline model. It is equipped with a saliency shift- aware convLSTM, which can efficiently capture video saliency dynamics through learning human attention-shift behavior. Extensive experiments open up promising future directions for model development and comparison.
PDF AbstractCode
Results from the Paper
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
Video Salient Object Detection | DAVIS-2016 | SSAV | S-Measure | 0.893 | # 2 | ||
MAX E-MEASURE | 0.948 | # 3 | |||||
MAX F-MEASURE | 0.861 | # 3 | |||||
AVERAGE MAE | 0.028 | # 7 | |||||
Video Salient Object Detection | DAVSOD-Difficult20 | SSAV | S-Measure | 0.619 | # 1 | ||
max E-measure | 0.696 | # 3 | |||||
Average MAE | 0.114 | # 2 | |||||
Video Salient Object Detection | DAVSOD-easy35 | SSAV | S-Measure | 0.755 | # 1 | ||
max F-Measure | 0.659 | # 1 | |||||
max E-Measure | 0.806 | # 1 | |||||
Average MAE | 0.084 | # 1 | |||||
Video Salient Object Detection | DAVSOD-Normal25 | SSAV | S-Measure | 0.661 | # 1 | ||
max E-measure | 0.723 | # 1 | |||||
Average MAE | 0.117 | # 1 | |||||
Video Salient Object Detection | FBMS-59 | SSAV | S-Measure | 0.879 | # 2 | ||
AVERAGE MAE | 0.040 | # 2 | |||||
MAX E-MEASURE | 0.926 | # 1 | |||||
MAX F-MEASURE | 0.865 | # 2 | |||||
Video Salient Object Detection | MCL | SSAV | S-Measure | 0.819 | # 2 | ||
MAX E-MEASURE | 0.889 | # 2 | |||||
MAX F-MEASURE | 0.773 | # 1 | |||||
AVERAGE MAE | 0.026 | # 7 | |||||
Video Salient Object Detection | SegTrack v2 | SSAV | S-Measure | 0.850 | # 3 | ||
MAX F-MEASURE | 0.801 | # 2 | |||||
AVERAGE MAE | 0.023 | # 2 | |||||
max E-measure | 0.917 | # 2 | |||||
Video Salient Object Detection | UVSD | SSAV | S-Measure | 0.860 | # 2 | ||
max E-measure | 0.939 | # 2 | |||||
Average MAE | 0.025 | # 2 | |||||
Video Salient Object Detection | ViSal | SSAV | S-Measure | 0.942 | # 2 | ||
max E-measure | 0.980 | # 2 | |||||
Average MAE | 0.021 | # 2 | |||||
Video Salient Object Detection | VOS-T | SSAV | S-Measure | 0.819 | # 2 | ||
max E-measure | 0.839 | # 2 | |||||
Average MAE | 0.074 | # 2 |