We introduce a dataset of 147 object categories containing over 6000 images that are suitable for the few-shot counting task. We collected and annotated images ourselves. Our dataset consists of 6135 images across a di- verse set of 147 object categories, from kitchen utensils and office stationery to vehicles and animals. The object count in our dataset varies widely, from 7 to 3731 objects, with an average count of 56 objects per image. In each image, each object instance is annotated with a dot at its approxi- mate center. In addition, three object instances are selected randomly as exemplar instances; these exemplars are also annotated with axis-aligned bounding boxes.
39 PAPERS • 3 BENCHMARKS
To effectively evaluate OmniCount across open-vocabulary, supervised, and few-shot counting tasks, a dataset catering to a broad spectrum of visual categories and instances featuring various visual categories with multiple instances and classes per image is essential. The current datasets, primarily designed for object counting focusing on singular object categories like humans and vehicles, fall short for multi-label object counting tasks. Despite the presence of multi-class datasets like MS COCO, their utility is limited for counting due to the sparse nature of object appearance. Addressing this gap, we created a new dataset with 30,230 images spanning 191 diverse categories, including kitchen utensils, office supplies, vehicles, and animals. This dataset, featuring a wide range of object instance counts per image ranging from 1 to 160 and an average count of 10, bridges the existing void and establishes a benchmark for assessing counting models in varied scenarios.
1 PAPER • 2 BENCHMARKS