The Caltech-UCSD Birds-200-2011 (CUB-200-2011) dataset is the most widely-used dataset for fine-grained visual categorization task. It contains 11,788 images of 200 subcategories belonging to birds, 5,994 for training and 5,794 for testing. Each image has detailed annotations: 1 subcategory label, 15 part locations, 312 binary attributes and 1 bounding box. The textual information comes from Reed et al.. They expand the CUB-200-2011 dataset by collecting fine-grained natural language descriptions. Ten single-sentence descriptions are collected for each image. The natural language descriptions are collected through the Amazon Mechanical Turk (AMT) platform, and are required at least 10 words, without any information of subcategories and actions.
1,959 PAPERS • 44 BENCHMARKS
Oxford 102 Flower is an image classification dataset consisting of 102 flower categories. The flowers chosen to be flower commonly occurring in the United Kingdom. Each class consists of between 40 and 258 images.
1,047 PAPERS • 14 BENCHMARKS
The PASCAL Context dataset is an extension of the PASCAL VOC 2010 detection challenge, and it contains pixel-wise labels for all training images. It contains more than 400 classes (including the original 20 classes plus backgrounds from PASCAL VOC segmentation), divided into three categories (objects, stuff, and hybrids). Many of the object categories of this dataset are too sparse and; therefore, a subset of 59 frequent classes are usually selected for use.
280 PAPERS • 6 BENCHMARKS
Animals with Attributes (AwA) was a dataset for benchmarking transfer-learning algorithms, in particular attribute base classification. It consisted of 30475 images of 50 animals classes with six pre-extracted feature representations for each image. The animals classes are aligned with Osherson's classical class/attribute matrix, thereby providing 85 numeric attribute values for each class. Using the shared attributes, it is possible to transfer information between different classes. The Animals with Attributes dataset was suspended. Its images are not available anymore because of copyright restrictions. A drop-in replacement, Animals with Attributes 2, is available instead.
252 PAPERS • 6 BENCHMARKS
Animals with Attributes 2 (AwA2) is a dataset for benchmarking transfer-learning algorithms, such as attribute base classification and zero-shot learning. AwA2 is a drop-in replacement of original Animals with Attributes (AwA) dataset, with more images released for each category. Specifically, AwA2 consists of in total 37322 images distributed in 50 animal categories. The AwA2 also provides a category-attribute matrix, which contains an 85-dim attribute vector (e.g., color, stripe, furry, size, and habitat) for each category.
211 PAPERS • 4 BENCHMARKS
aPY is a coarse-grained dataset composed of 15339 images from 3 broad categories (animals, objects and vehicles), further divided into a total of 32 subcategories (aeroplane, …, zebra).
144 PAPERS • 4 BENCHMARKS
The MIT-States dataset has 245 object classes, 115 attribute classes and ∼53K images. There is a wide range of objects (e.g., fish, persimmon, room) and attributes (e.g., mossy, deflated, dirty). On average, each object instance is modified by one of the 9 attributes it affords.
65 PAPERS • 4 BENCHMARKS
The SUN Attribute dataset consists of 14,340 images from 717 scene categories, and each category is annotated with a taxonomy of 102 discriminate attributes. The dataset can be used for high-level scene understanding and fine-grained scene recognition.
37 PAPERS • 2 BENCHMARKS
The COCO-MLT is created from MS COCO-2017, containing 1,909 images from 80 classes. The maximum of training number per class is 1,128 and the minimum is 6. We use the test set of COCO2017 with 5,000 for evaluation. The ratio of head, medium, and tail classes is 22:33:25 in COCO-MLT.
12 PAPERS • 2 BENCHMARKS
We construct the long-tailed version of VOC from its 2012 train-val set. It contains 1,142 images from 20 classes, with a maximum of 775 images per class and a minimum of 4 images per class. The ratio of head, medium, and tail classes after splitting is 6:6:8. We evaluate the performance on VOC2007 test set with 4952 images.
LAD (Large-scale Attribute Dataset) has 78,017 images of 5 super-classes and 230 classes. The image number of LAD is larger than the sum of the four most popular attribute datasets (AwA, CUB, aP/aY and SUN). 359 attributes of visual, semantic and subjective properties are defined and annotated in instance-level.
10 PAPERS • NO BENCHMARKS YET
AO-CLEVr is a new synthetic-images dataset containing images of "easy" Attribute-Object categories, based on the CLEVr. AO-CLEVr has attribute-object pairs created from 8 attributes: { red, purple, yellow, blue, green, cyan, gray, brown } and 3 object shapes {sphere, cube, cylinder}, yielding 24 attribute-object pairs. Each pair consists of 7500 images. Each image has a single object that consists of the attribute-object pair. The object is randomly assigned one of two sizes (small/large), one of two materials (rubber/metallic), a random position, and random lightning according to CLEVr defaults.
5 PAPERS • NO BENCHMARKS YET
transform the ImageNet-1K classification datatset for Chinese models by translating labels and prompts into Chinese.
3 PAPERS • 1 BENCHMARK
Edge-Map-345C is a large-scale edge-map dataset including 290,281 edge-maps corresponding to 345 object categories of QuickDraw dataset. In particular, these 345 categories are corresponding to the 345 free-hand sketch categories of Google QuickDraw dataset.
1 PAPER • NO BENCHMARKS YET
The Generix Object Zero-shot Learning (GOZ) dataset is a benchmark dataset for zero-shot learning.
Sequence Consistency Evaluation (SCE) consists of a benchmark task for sequence consistency evaluation (SCE).