The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images. Annotations:
The dataset has annotations for
object detection: bounding boxes and per-instance segmentation masks with 80 object categories,
captioning: natural language descriptions of the images (see Captions),
keypoints detection: containing more than 200,000 images and 250,000 person instances labeled with keypoints (17 possible keypoints, such as left eye, nose, right hip, right ankle),
stuff image segmentation – per-pixel segmentation masks with 91 stuff categories, such as grass, wall, sky (see MS COCO Stuff),
panoptic: full scene segmentation, with 80 thing categories (such as person, bicycle, elephant) and
10,453 PAPERS
• 93 BENCHMARKS