Video Saliency Detection
9 papers with code • 4 benchmarks • 1 datasets
It consists of two building blocks: first, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information.
Network failures continue to plague datacenter operators as their symptoms may not have direct correlation with where or why they occur.
With the rapid development of deep learning techniques, image saliency deep models trained solely by spatial information have occasionally achieved detection performance for video data comparable to that of the models trained by both spatial and temporal information.
Exploring Rich and Efficient Spatial Temporal Interactions for Real Time Video Salient Object Detection
In this way, even though the overall video saliency quality is heavily dependent on its spatial branch, however, the performance of the temporal branch still matter.
When the base hierarchical model is empowered with domain-specific modules, performance improves, outperforming state-of-the-art models on three out of five metrics on the DHF1K benchmark and reaching the second-best results on the other two.
From Semantic Categories to Fixations: A Novel Weakly-Supervised Visual-Auditory Saliency Detection Approach
Thanks to the rapid advances in the deep learning techniques and the wide availability of large-scale training sets, the performances of video saliency detection models have been improving steadily and significantly.
Moreover, we distill knowledge from these regions to obtain complete new spatial-temporal-audio (STA) fixation prediction (FP) networks, enabling broad applications in cases where video tags are not available.