4 dataset results for Fine-Grained Image Recognition AND English

CNFOOD-241 Contains a dataset of 241 Chinese dishes with 191,811 images. There are 170843 images in the training set and 20943 images in the validation set. All images are resized to 600x600. As some of the images in the dataset are from ChineseFoodNet, they are not supported for commercial use. CNFOOD-241-Chen is the CNFOOD-241 dataset spilt with the list introduced in the paper "Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning," which has random split as train, val, test three parts.

1 PAPER • 1 BENCHMARK

OVEN (Open-domain Visual Entity Recognition)

In this project, we formally present the task of Open-domain Visual Entity recognitioN (OVEN), where a model need to link an image onto a Wikipedia entity with respect to a text query. We construct OVEN-Wiki by re-purposing 14 existing datasets with all labels grounded onto one single label space: Wikipedia entities. OVEN challenges models to select among six million possible Wikipedia entities, making it a general visual recognition benchmark with the largest number of labels.

10 PAPERS • 1 BENCHMARK

OmniBenchmark

Omni-Realm Benchmark (OmniBenchmark) is a diverse (21 semantic realm-wise datasets) and concise (realm-wise datasets have no concepts overlapping) benchmark for evaluating pre-trained model generalization across semantic super-concepts/realms, e.g. across mammals to aircraft.

28 PAPERS • 1 BENCHMARK

WikiChurches

WikiChurches is a dataset for architectural style classification, consisting of 9,485 images of church buildings. Both images and style labels were sourced from Wikipedia. The dataset can serve as a benchmark for various research fields, as it combines numerous real-world challenges: fine-grained distinctions between classes based on subtle visual features, a comparatively small sample size, a highly imbalanced class distribution, a high variance of viewpoints, and a hierarchical organization of labels, where only some images are labeled at the most precise level.

1 PAPER • NO BENCHMARKS YET

Datasets

4 dataset results for Fine-Grained Image Recognition AND English