CIFAR-10H is a new dataset of soft labels reflecting human perceptual uncertainty for the 10,000-image CIFAR-10 test set. This contains 1,000 images for each of the 10 categories in the original CIFAR-10 dataset.
There are a total of 511,400 human classifications collected via Amazon Mechanical Turk. When specifying the task on Amazon Mechanical Turk, participants were asked to categorize each image by clicking one of the 10 labels surrounding it as quickly and accurately as possible (but with no time limit). Label positions were shuffled between candidates. After an initial training phase, each participant (2,571 total) categorized 200 images, 20 from each category. Every 20 trials, an obvious image was presented as an attention check, and participants who scored below 75% on these were removed from the final analysis (14 total). We collected 51 judgments per image on average (range: 47 − 63). Average completion time was 15 minutes, and workers were paid $1.50 total.