13 dataset results for Weakly Supervised Object Detection

The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.

13,519 PAPERS • 41 BENCHMARKS

MS COCO (Microsoft Common Objects in Context)

The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.

10,220 PAPERS • 93 BENCHMARKS

Charades

The Charades dataset is composed of 9,848 videos of daily indoors activities with an average length of 30 seconds, involving interactions with 46 objects classes in 15 types of indoor scenes and containing a vocabulary of 30 verbs leading to 157 action classes. Each video in this dataset is annotated by multiple free-text descriptions, action labels, action intervals and classes of interacting objects. 267 different users were presented with a sentence, which includes objects and actions from a fixed vocabulary, and they recorded a video acting out the sentence. In total, the dataset contains 66,500 temporal annotations for 157 action classes, 41,104 labels for 46 object classes, and 27,847 textual descriptions of the videos. In the standard split there are7,986 training video and 1,863 validation video.

384 PAPERS • 6 BENCHMARKS

Foggy Cityscapes

Foggy Cityscapes is a synthetic foggy dataset which simulates fog on real scenes. Each foggy image is rendered with a clear image and depth map from Cityscapes. Thus the annotations and data split in Foggy Cityscapes are inherited from Cityscapes.

207 PAPERS • 6 BENCHMARKS

HICO-DET

HICO-DET is a dataset for detecting human-object interactions (HOI) in images. It contains 47,776 images (38,118 in train set and 9,658 in test set), 600 HOI categories constructed by 80 object categories and 117 verb classes. HICO-DET provides more than 150k annotated human-object pairs. V-COCO provides 10,346 images (2,533 for training, 2,867 for validating and 4,946 for testing) and 16,199 person instances. Each person has annotations for 29 action categories and there are no interaction labels including objects.

156 PAPERS • 5 BENCHMARKS

PASCAL VOC 2007

PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are:

119 PAPERS • 14 BENCHMARKS

PASCAL VOC 2012 test

SCC Data Set

109 PAPERS • 3 BENCHMARKS

MSCOCO

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

57 PAPERS • NO BENCHMARKS YET

Clipart1k

In Clipart1k, the target domain classes to be detected are the same as those in the source domain. All the images for a clipart domain were collected from one dataset (i.e., CMPlaces) and two image search engines (i.e., Openclipart2 and Pixabay3). Search queries used are 205 scene classes (e.g., pasture) used in CMPlaces to collect various objects and scenes with complex backgrounds.

39 PAPERS • NO BENCHMARKS YET

Watercolor2k

Watercolor2k is a dataset used for cross-domain object detection which contains 2k watercolor images with image and instance-level annotations.

34 PAPERS • 4 BENCHMARKS

Comic2k

Comic2k is a dataset used for cross-domain object detection which contains 2k comic images with image and instance-level annotations. Image Source: https://naoto0804.github.io/cross_domain_detection/

27 PAPERS • 7 BENCHMARKS

PeopleArt

People-Art is an object detection dataset which consists of people in 43 different styles. People contained in this dataset are quite different from those in common photographs. There are 42 categories of art styles and movements including Naturalism, Cubism, Socialist Realism, Impressionism, and Suprematism

11 PAPERS • 2 BENCHMARKS

IconArt

This dataset contains 5955 painting images (from WikiCommons) : a train set of 2978 images and a test set of 2977 images (for classification task). 1480 of the 2977 images are annotated with bounding boxes for 7 iconographic classes : ‘angel’,‘Child_Jesus’,‘crucifixion_of_Jesus’,‘Mary’,‘nudity’, ‘ruins’,‘Saint_Sebastien’.

6 PAPERS • 1 BENCHMARK

Datasets

13 dataset results for Weakly Supervised Object Detection