The dataset offers tag and mask annotations for image-text pairs from the CC3M validation set. Tag annotations denote words that aptly describe the relationship between the image and the corresponding text. These annotations provide valuable insights into the semantic connection between each pair's visual and textual elements.
5 PAPERS • 2 BENCHMARKS