Paper

Learning Saliency Prediction From Sparse Fixation Pixel Map

Ground truth for saliency prediction datasets consists of two types of map data: fixation pixel map which records the human eye movements on sample images, and fixation blob map generated by performing gaussian blurring on the corresponding fixation pixel map. Current saliency approaches perform prediction by directly pixel-wise regressing the input image into saliency map with fixation blob as ground truth, yet learning saliency from fixation pixel map is not explored. In this work, we propose a first-of-its-kind approach of learning saliency prediction from sparse fixation pixel map, and a novel loss function for training from such sparse fixation. We utilize clustering to extract sparse fixation pixel from the raw fixation pixel map, and add a max-pooling transformation on the output to avoid false penalty between sparse outputs and labels caused by nearby but non-overlapping saliency pixels when calculating loss. This approach provides a novel perspective for achieving saliency prediction. We evaluate our approach over multiple benchmark datasets, and achieve competitive performance in terms of multiple metrics comparing with state-of-the-art saliency methods.

Results in Papers With Code
(↓ scroll down to see all results)