12 papers with code • 10 benchmarks • 4 datasets
Video salient object detection (VSOD) is significantly essential for understanding the underlying mechanism behind HVS during free-viewing in general and instrumental to a wide range of real-world applications, e.g., video segmentation, video captioning, video compression, autonomous driving, robotic interaction, weakly supervised attention. Besides its academic value and practical significance, VSOD presents great difficulties due to the challenges carried by video data (diverse motion patterns, occlusions, blur, large object deformations, etc.) and the inherent complexity of human visual attention behavior (i.e., selective attention allocation, attention shift) during dynamic scenes. Online benchmark: http://dpfan.net/davsod.
This is the first work that explicitly emphasizes the challenge of saliency shift, i. e., the video salient object(s) may dynamically change.
Ranked #1 on Video Salient Object Detection on DAVSOD-easy35
This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM).
Ranked #1 on Video Salient Object Detection on UVSD (using extra training data)
Specifically, we present an effective video saliency detector that consists of a spatial refinement network and a spatiotemporal module.
Ranked #1 on Video Salient Object Detection on VOS-T (using extra training data)
Building on the observation that foreground areas are surrounded by the regions with high spatiotemporal edge values, geodesic distance provides an initial estimation for foreground and background.
Ranked #4 on Video Salient Object Detection on DAVSOD-Difficult20 (using extra training data)
In this way, even though the overall video saliency quality is heavily dependent on its spatial branch, however, the performance of the temporal branch still matter.
Significant performance improvement has been achieved for fully-supervised video salient object detection with the pixel-wise labeled training datasets, which are time-consuming and expensive to obtain.
In this paper, we investigate the complimentary roles of spatial and temporal information and propose a novel dynamic spatiotemporal network (DS-Net) for more effective fusion of spatiotemporal information.
Despite their simplicity, such fusion strategies may introduce feature redundancy, and also fail to fully exploit the relationship between multi-level features extracted from both spatial and temporal domains.
In this paper, we present a real-time salient object detection system based on the minimum spanning tree.
Ranked #5 on Video Salient Object Detection on ViSal (using extra training data)