The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
10,841 PAPERS • 96 BENCHMARKS
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
72 PAPERS • NO BENCHMARKS YET
BigEarthNet consists of 590,326 Sentinel-2 image patches, each of which is a section of i) 120x120 pixels for 10m bands; ii) 60x60 pixels for 20m bands; and iii) 20x20 pixels for 60m bands.
69 PAPERS • 3 BENCHMARKS
Our goal is to improve upon the status quo for designing image classification models trained in one domain that perform well on images from another domain. Complementing existing work in robustness testing, we introduce the first test dataset for this purpose which comes from an authentic use case where photographers wanted to learn about the content in their images. We built a new test set using 8,900 images taken by people who are blind for which we collected metadata to indicate the presence versus absence of 200 ImageNet object categories. We call this dataset VizWiz-Classification.
22 PAPERS • 3 BENCHMARKS
The COCO-MLT is created from MS COCO-2017, containing 1,909 images from 80 classes. The maximum of training number per class is 1,128 and the minimum is 6. We use the test set of COCO2017 with 5,000 for evaluation. The ratio of head, medium, and tail classes is 22:33:25 in COCO-MLT.
12 PAPERS • 2 BENCHMARKS
We construct the long-tailed version of VOC from its 2012 train-val set. It contains 1,142 images from 20 classes, with a maximum of 775 images per class and a minimum of 4 images per class. The ratio of head, medium, and tail classes after splitting is 6:6:8. We evaluate the performance on VOC2007 test set with 4952 images.
Sewer-ML is a sewer defect dataset. It contains 1.3 million images, from 75,618 videos collected from three Danish water utility companies over nine years. All videos have been annotated by licensed sewer inspectors following the Danish sewer inspection standard, Fotomanualen. This leads to consistent and reliable annotations, and a total of 17 annotated defect classes.
4 PAPERS • NO BENCHMARKS YET
This dataset contains images of individual hand-written Bengali characters. Bengali characters (graphemes) are written by combining three components: a grapheme_root, vowel_diacritic, and consonant_diacritic. Your challenge is to classify the components of the grapheme in each image. There are roughly 10,000 possible graphemes, of which roughly 1,000 are represented in the training set. The test set includes some graphemes that do not exist in the train but has no new grapheme components. It takes a lot of volunteers filling out sheets like this to generate a useful amount of real data; focusing the problem on the grapheme components rather than on recognizing whole graphemes should make it possible to assemble a Bengali OCR system without handwriting samples for all 10,000 graphemes.
2 PAPERS • 1 BENCHMARK
LADI Overview The Low Altitude Disaster Imagery (LADI) dataset was created to address the relative lack of annotated post-disaster aerial imagery in the computer vision community. Low altitude post-disaster aerial imagery from small planes and UAVs can provide high-resolution imagery to emergency management agencies to help them prioritize response efforts and perform damage assessments. In order to accelerate their workflow, computer vision can be used to automatically identify images that contain features of interest, including infrastructure such as buildings and roads, damage to such infrastructure, and hazards such as floods or debris.
1 PAPER • NO BENCHMARKS YET
dacl10k stands for damage classification 10k images and is a multi-label semantic segmentation dataset for 19 classes (13 damages and 6 objects) present on bridges.