Video Saliency Prediction
14 papers with code • 0 benchmarks • 2 datasets
Benchmarks
These leaderboards are used to track progress in Video Saliency Prediction
Most implemented papers
Temporal Saliency Adaptation in Egocentric Videos
This work adapts a deep neural model for image saliency prediction to the temporal domain of egocentric video.
DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction
Our results suggest that (1) audio is a strong contributing cue for saliency prediction, (2) salient visible sound-source is the natural cause of the superiority of our Audio-Visual model, (3) richer feature representations for the input space leads to more powerful predictions even in absence of more sophisticated saliency decoders, and (4) Audio-Visual model improves over 53. 54\% of the frames predicted by the best Visual model (our baseline).
Simple vs complex temporal recurrences for video saliency prediction
This paper investigates modifying an existing neural network architecture for static saliency prediction using two types of recurrences that integrate information from the temporal domain.
Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM
We further find from our database that there exists a temporal correlation of human attention with a smooth saliency transition across video frames.
A Learning-Based Visual Saliency Prediction Model for Stereoscopic 3D Video (LBVS-3D)
Our model starts with a rough segmentation and quantifies several intuitive observations such as the effects of visual discomfort level, depth abruptness, motion acceleration, elements of surprise, size and compactness of the salient regions, and emphasizing only a few salient objects in a scene.
DeepVS: A Deep Learning Based Video Saliency Prediction Approach
Hence, an object-to-motion convolutional neural network (OM-CNN) is developed to predict the intra-frame saliency for DeepVS, which is composed of the objectness and motion subnets.
Video Saliency Prediction Using Enhanced Spatiotemporal Alignment Network
Due to a variety of motions across different frames, it is highly challenging to learn an effective spatiotemporal representation for accurate video saliency prediction (VSP).
Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction
When the base hierarchical model is empowered with domain-specific modules, performance improves, outperforming state-of-the-art models on three out of five metrics on the DHF1K benchmark and reaching the second-best results on the other two.
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction
We also explore a variation of ViNet architecture by augmenting audio features into the decoder.
Noise-Aware Video Saliency Prediction
We note that the accuracy of the maps reconstructed from the gaze data of a fixed number of observers varies with the frame, as it depends on the content of the scene.