IQM is curated for the image-text matching task. Each image has a corresponding search query. We first use CTR to select the most relevant pairs. In this dataset, we randomly select image-query pairs in the candidate set after performing the cleaning process, obtaining 400,000 image-text pairs, including 200,000 positive cases and 200,000 negative cases. We keep the ratio of positive and negative pairs consistent in each of the train/val/test sets.