Human Attention in Fine-grained Classification

2 Nov 2021  ·  Yao Rong, Wenjia Xu, Zeynep Akata, Enkelejda Kasneci ·

The way humans attend to, process and classify a given image has the potential to vastly benefit the performance of deep learning models. Exploiting where humans are focusing can rectify models when they are deviating from essential features for correct decisions. To validate that human attention contains valuable information for decision-making processes such as fine-grained classification, we compare human attention and model explanations in discovering important features. Towards this goal, we collect human gaze data for the fine-grained classification dataset CUB and build a dataset named CUB-GHA (Gaze-based Human Attention). Furthermore, we propose the Gaze Augmentation Training (GAT) and Knowledge Fusion Network (KFN) to integrate human gaze knowledge into classification models. We implement our proposals in CUB-GHA and the recently released medical dataset CXR-Eye of chest X-ray images, which includes gaze data collected from a radiologist. Our result reveals that integrating human attention knowledge benefits classification effectively, e.g. improving the baseline by 4.38% on CXR. Hence, our work provides not only valuable insights into understanding human attention in fine-grained classification, but also contributes to future research in integrating human gaze with computer vision tasks. CUB-GHA and code are available at https://github.com/yaorong0921/CUB-GHA.

PDF Abstract

Datasets


Introduced in the Paper:

CUB-GHA

Used in the Paper:

CUB-200-2011
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Fine-Grained Image Classification CUB-200-2011 GAT Accuracy 88.66% # 41

Methods


No methods listed for this paper. Add relevant methods here