In this dataset, we collect 200,000 image-text pairs. Each image has a corresponding caption text, which describes the image in detail. It contains two subtasks: image-to-text retrieval and text-to-image retrieval tasks.
Paper | Code | Results | Date | Stars |
---|