The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
Splits: The first version of MS COCO dataset was released in 2014. It contains 164K images split into training (83K), validation (41K) and test (41K) sets. In 2015 additional test set of 81K images was released, including all the previous test images and 40K new images.
Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.
Annotations: The dataset has annotations for
| Paper | Code | Results | Date | Stars |
|---|
Visual Question Answering
Multi-Label Classification
Object Localization
Cross-Modal Retrieval
Object Counting
Interactive Segmentation
Quantization
Question Generation
Unsupervised Semantic Segmentation with Language-image Pre-training
Knowledge Distillation
Scene Graph Generation
Image-to-Text Retrieval
Homography Estimation
Single-object discovery
Zero-shot Text-to-Image Retrieval
Paraphrase Generation
Unsupervised Object Localization
Weakly-supervised instance segmentation
Multi-object discovery
Zero-Shot Cross-Modal Retrieval
Semi Supervised Learning for Image Captioning
Image-level Supervised Instance Segmentation
Point-Supervised Instance Segmentation
Active Object Detection
Box-supervised Instance Segmentation
mage-to-Text Retrieval
Region Proposal
Activeness Detection
Generalized Zero-Shot Object Detection
Few Shot Open Set Object Detection