Video salient object detection (VSOD) is significantly essential for understanding the underlying mechanism behind HVS during free-viewing in general and instrumental to a wide range of real-world applications, e.g., video segmentation, video captioning, video compression, autonomous driving, robotic interaction, weakly supervised attention. Besides its academic value and practical significance, VSOD presents great difficulties due to the challenges carried by video data (diverse motion patterns, occlusions, blur, large object deformations, etc.) and the inherent complexity of human visual attention behavior (i.e., selective attention allocation, attention shift) during dynamic scenes. Online benchmark: http://dpfan.net/davsod.
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
This is the first work that explicitly emphasizes the challenge of saliency shift, i. e., the video salient object(s) may dynamically change.
Ranked #1 on Video Salient Object Detection on FBMS-59 (using extra training data)
In this paper, we develop a multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using two sub-networks, one sub-network for salient object detection in still images and the other for motion saliency detection in optical flow images.
This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM).
Ranked #1 on Video Salient Object Detection on UVSD (using extra training data)
Specifically, we present an effective video saliency detector that consists of a spatial refinement network and a spatiotemporal module.
Ranked #1 on Video Salient Object Detection on VOS-T (using extra training data)
Our new measure simultaneously evaluates region-aware and object-aware structural similarity between a SM and a GT map.
Building on the observation that foreground areas are surrounded by the regions with high spatiotemporal edge values, geodesic distance provides an initial estimation for foreground and background.
Ranked #4 on Video Salient Object Detection on DAVSOD-Difficult20 (using extra training data)
In this way, even though the overall video saliency quality is heavily dependent on its spatial branch, however, the performance of the temporal branch still matter.
In this paper, we investigate the complimentary roles of spatial and temporal information and propose a novel dynamic spatiotemporal network (DS-Net) for more effective fusion of spatiotemporal information.
Significant performance improvement has been achieved for fully-supervised video salient object detection with the pixel-wise labeled training datasets, which are time-consuming and expensive to obtain.
Consequently, we can achieve a significant performance improvement by using this new training set to start a new round of network training.