ICM (Image-Caption Matching Dataset)

Introduced by Xie et al. in Zero and R2D2: A Large-scale Chinese Cross-modal Benchmark and A Vision-Language Framework

ICM is curated for the image-text matching task. Each image has a corresponding caption text, which describes the image in detail. We first use CTR to select the most relevant pairs. Then, human annotators manually perform a 2nd round manual correction, obtaining 400,000 image-text pairs, including 200,000 positive cases and 200,000 negative cases. We keep the ratio of positive and negative pairs consistent in each of the train/val/test sets.


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


  • Unknown