VQA-HAT (Human ATtention) is a dataset to evaluate the informative regions of an image depending on the question being asked about it. The dataset consists of human visual attention maps over the images in the original VQA dataset. It contains more than 60k attention maps.
14 PAPERS • NO BENCHMARKS YET
Provides a wide range of raw sensor data that is accessible on almost any modern-day smartphone together with a high-quality ground-truth track.
6 PAPERS • NO BENCHMARKS YET