7 dataset results for Semantic correspondence

CUB-200-2011 (Caltech-UCSD Birds-200-2011)

The Caltech-UCSD Birds-200-2011 (CUB-200-2011) dataset is the most widely-used dataset for fine-grained visual categorization task. It contains 11,788 images of 200 subcategories belonging to birds, 5,994 for training and 5,794 for testing. Each image has detailed annotations: 1 subcategory label, 15 part locations, 312 binary attributes and 1 bounding box. The textual information comes from Reed et al.. They expand the CUB-200-2011 dataset by collecting fine-grained natural language descriptions. Ten single-sentence descriptions are collected for each image. The natural language descriptions are collected through the Amazon Mechanical Turk (AMT) platform, and are required at least 10 words, without any information of subcategories and actions.

1,953 PAPERS • 44 BENCHMARKS

Caltech-101

The Caltech101 dataset contains images from 101 object categories (e.g., “helicopter”, “elephant” and “chair” etc.) and a background category that contains the images not from the 101 object categories. For each object category, there are about 40 to 800 images, while most classes have about 50 images. The resolution of the image is roughly about 300×200 pixels.

577 PAPERS • 7 BENCHMARKS

SPair-71k

SPair-71k contains 70,958 image pairs with diverse variations in viewpoint and scale. Compared to previous datasets, it is significantly larger in number and contains more accurate and richer annotations.

58 PAPERS • 2 BENCHMARKS

PF-PASCAL

53 PAPERS • 1 BENCHMARK

PF-WILLOW

35 PAPERS • 1 BENCHMARK

AVMIT (Audiovisual Moments in Time)

Audiovisual Moments in Time (AVMIT) is a large-scale dataset of audiovisual action events. The dataset includes the annotation of 57,177 audiovisual videos from the Moments in Time dataset, each independently evaluated by 3 of 11 trained participants. Each annotation pertains to whether the labelled audiovisual action event is present and whether it is the most prominent feature of the video. The dataset also provides a curated test set of 960 videos across 16 classes, suitable for comparative experiments involving computational models and human participants, specifically when addressing research questions where audiovisual correspondence is of critical importance.

1 PAPER • NO BENCHMARKS YET

FunKPoint

FunKPoint is a dataset for finding correspondences in visual data that has ground truth correspondences for 10 tasks and 20 object categories.

1 PAPER • NO BENCHMARKS YET

Datasets

7 dataset results for Semantic correspondence