Fine-Grained Image Classification
173 papers with code • 35 benchmarks • 36 datasets
Fine-Grained Image Classification is a task in computer vision where the goal is to classify images into subcategories within a larger category. For example, classifying different species of birds or different types of flowers. This task is considered to be fine-grained because it requires the model to distinguish between subtle differences in visual appearance and patterns, making it more challenging than regular image classification tasks.
( Image credit: Looking for the Devil in the Details )
Libraries
Use these libraries to find Fine-Grained Image Classification models and implementationsDatasets
Most implemented papers
Neural Architecture Transfer
At the same time, the architecture search and transfer is orders of magnitude more efficient than existing NAS methods.
Learning Semantically Enhanced Feature for Fine-Grained Image Classification
We aim to provide a computationally cheap yet effective approach for fine-grained image classification (FGIC) in this letter.
Concept Learners for Few-Shot Learning
Developing algorithms that are able to generalize to a novel task given only a few labeled examples represents a fundamental challenge in closing the gap between machine- and human-level performance.
SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data
As the main discriminative information of a fine-grained image usually resides in subtle regions, methods along this line are prone to heavy label noise in fine-grained recognition.
Fine-Grained Visual Classification via Simultaneously Learning of Multi-regional Multi-grained Features
Finally, we can obtain multiple discriminative regions on high-level feature channels and obtain multiple more minute regions within these discriminative regions on middle-level feature channels.
TransFG: A Transformer Architecture for Fine-grained Recognition
Fine-grained visual classification (FGVC) which aims at recognizing objects from subcategories is a very challenging task due to the inherently subtle inter-class differences.
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Vision Transformers (ViTs) and MLPs signal further efforts on replacing hand-wired features or inductive biases with general-purpose neural architectures.
AutoFormer: Searching Transformers for Visual Recognition
Specifically, the performance of these subnets with weights inherited from the supernet is comparable to those retrained from scratch.
Self-Supervised Learning by Estimating Twin Class Distributions
To solve this problem, we propose to maximize the mutual information between the input and the class predictions.
A Simple Episodic Linear Probe Improves Visual Recognition in the Wild
In this paper, we propose an episodic linear probing (ELP) classifier to reflect the generalization of visual representations in an online manner.