Fine-Grained Image Classification

173 papers with code • 35 benchmarks • 36 datasets

Fine-Grained Image Classification is a task in computer vision where the goal is to classify images into subcategories within a larger category. For example, classifying different species of birds or different types of flowers. This task is considered to be fine-grained because it requires the model to distinguish between subtle differences in visual appearance and patterns, making it more challenging than regular image classification tasks.

( Image credit: Looking for the Devil in the Details )

Benchmarks

Add a Result

These leaderboards are used to track progress in Fine-Grained Image Classification

Dataset	Best Model	Compare
Stanford Cars	CMAL-Net	See all
CUB-200-2011	HERBS	See all
FGVC Aircraft	SR-GNN	See all
Oxford 102 Flowers	VIT-L/16 (Background)	See all
CUB-200-2011	HERBS	See all
NABirds	MetaFormer (MetaFormer-2,384)	See all
Stanford Dogs	SR-GNN	See all
Oxford-IIIT Pet Dataset	OmniVec	See all
Food-101	CAP	See all
Caltech-101	VIT-L/16	See all
Oxford-IIIT Pets	EffNet-L2 (SAM)	See all
CompCars	ResNet101-swp	See all
Birdsnap	EffNet-L2 (SAM)	See all
Bird-225	WideResNet-101 (Spinal FC)	See all
SUN397	µ2Net (ViT-L/16)	See all
10 Monkey Species	Inception-v3 (Spinal FC)	See all
Fruits-360	ResNeXt-101	See all
FoodX-251	CSWin-L	See all
Imbalanced CUB-200-2011	PC-Softmax	See all
SOP	Assemble-ResNet-FGVC-50	See all
Con-Text	PHOC descriptor + Fisher Vector Encoding	See all
Bottles	PHOC descriptor + Fisher Vector Encoding	See all
MNIST	Vanilla FC layer only	See all
EMNIST-Digits	VGG-5	See all
EMNIST-Letters	VGG-5	See all
QMNIST	VGG-5	See all
Kuzushiji-MNIST	VGG-5	See all
STL-10	Pre trained wide-resnet-101	See all
BoxCars116K	ResNet152 + COOC	See all
CarFlag-1532	ResNet101-swp	See all
CarFlag-563	ResNet101-swp	See all
iNaturalist	TASN	See all
FGVC-Aircraft	EnGraf-Net101 (G=4, H=1)	See all
Herbarium 2021 Half–Earth	Conviformer-B	See all
Herbarium 2022	Conviformer-B	See all

Show all 35 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Fine-Grained Image Classification models and implementations

rwightman/pytorch-image-models

7 papers

30,168

open-mmlab/mmclassification

4 papers

3,219

osmr/imgclsmob

4 papers

2,925

Westlake-AI/openmixup

4 papers

579

See all 25 libraries.

Datasets

Subtasks

Displaced People Recognition

Most implemented papers

Most implemented Social Latest No code

Neural Architecture Transfer

human-analysis/neural-architecture-transfer • • 12 May 2020

At the same time, the architecture search and transfer is orders of magnitude more efficient than existing NAS methods.

Paper
Code

Learning Semantically Enhanced Feature for Fine-Grained Image Classification

cswluo/SEF • • 24 Jun 2020

We aim to provide a computationally cheap yet effective approach for fine-grained image classification (FGIC) in this letter.

Paper
Code

Concept Learners for Few-Shot Learning

snap-stanford/comet • • ICLR 2021

Developing algorithms that are able to generalize to a novel task given only a few labeled examples represents a fundamental challenge in closing the gap between machine- and human-level performance.

Paper
Code

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data

Shaoli-Huang/SnapMix • • 9 Dec 2020

As the main discriminative information of a fine-grained image usually resides in subtle regions, methods along this line are prone to heavy label noise in fine-grained recognition.

Paper
Code

Fine-Grained Visual Classification via Simultaneously Learning of Multi-regional Multi-grained Features

dongliangchang/Top-Down-Spatial-Attention-Loss • • 31 Jan 2021

Finally, we can obtain multiple discriminative regions on high-level feature channels and obtain multiple more minute regions within these discriminative regions on middle-level feature channels.

Paper
Code

TransFG: A Transformer Architecture for Fine-grained Recognition

TACJu/TransFG • • 14 Mar 2021

Fine-grained visual classification (FGVC) which aims at recognizing objects from subcategories is a very challenging task due to the inherently subtle inter-class differences.

Paper
Code

When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations

google-research/vision_transformer • • ICLR 2022

Vision Transformers (ViTs) and MLPs signal further efforts on replacing hand-wired features or inductive biases with general-purpose neural architectures.

Paper
Code