6 dataset results for Cross-Modal Retrieval AND Images AND English

MS COCO (Microsoft Common Objects in Context)

The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.

10,185 PAPERS • 93 BENCHMARKS

Flickr30k

The Flickr30k dataset contains 31,000 images collected from Flickr, together with 5 reference sentences provided by human annotators.

735 PAPERS • 9 BENCHMARKS

Twitter100k

Twitter100k is a large-scale dataset for weakly supervised cross-media retrieval.

4 PAPERS • NO BENCHMARKS YET

Flickr-8k

Contains 8k flickr Images with captions. Visit this page to explore the data.

3 PAPERS • 1 BENCHMARK

Earth on Canvas

A Zero-Shot Sketch-based Inter-Modal Object Retrieval Scheme for Remote Sensing Images

1 PAPER • NO BENCHMARKS YET

IAPR TC-12 (IAPR TC-12 Benchmark)

The image collection of the IAPR TC-12 Benchmark consists of 20,000 still natural images taken from locations around the world and comprising an assorted cross-section of still natural images. This includes pictures of different sports and actions, photographs of people, animals, cities, landscapes, and many other aspects of contemporary life. Each image is associated with a text caption in up to three different languages (English, German and Spanish).

1 PAPER • NO BENCHMARKS YET

Datasets

6 dataset results for Cross-Modal Retrieval AND Images AND English