Scanpath prediction
11 papers with code • 3 benchmarks • 2 datasets
Learning to Predict Sequences of Human Fixations.
Most implemented papers
SaltiNet: Scan-path Prediction on 360 Degree Images using Saliency Volumes
The first part of the network consists of a model trained to generate saliency volumes, whose parameters are fit by back-propagation computed from a binary cross entropy (BCE) loss over downsampled versions of the saliency volumes.
Variational Laws of Visual Attention for Dynamic Scenes
We devise variational laws of the eye-movement that rely on a generalized view of the Least Action Principle in physics.
PathGAN: Visual Scanpath Prediction with Generative Adversarial Networks
We introduce PathGAN, a deep neural network for visual scanpath prediction trained on adversarial examples.
Gravitational Laws of Focus of Attention
The understanding of the mechanisms behind focus of attention in a visual scene is a problem of great interest in visual perception and computer vision.
On gaze deployment to audio-visual cues of social interactions
Attention supports our urge to forage on social cues.
Predicting Human Scanpaths in Visual Question Answering
Conditioned on a task guidance map, the proposed model learns question-specific attention patterns to generate scanpaths.
ScanDMM: A Deep Markov Model of Scanpath Prediction for 360deg Images
Scanpath prediction for 360deg images aims to produce dynamic gaze behaviors based on the human visual perception mechanism.
Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers
Most models of visual attention aim at predicting either top-down or bottom-up control, as studied using different visual search and free-viewing tasks.
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
In response, we pose a new task called ZeroGaze, a new variant of zero-shot learning where gaze is predicted for never-before-searched objects, and we develop a novel model, Gazeformer, to solve the ZeroGaze problem.
Pathformer3D: A 3D Scanpath Transformer for 360° Images
Then, the contextual feature representation and historical fixation information are input into a Transformer decoder to output current time step's fixation embedding, where the self-attention module is used to imitate the visual working memory mechanism of human visual system and directly model the time dependencies among the fixations.