Fine-Grained Visual Categorization
26 papers with code • 0 benchmarks • 5 datasets
Benchmarks
These leaderboards are used to track progress in Fine-Grained Visual Categorization
Latest papers
Coping with Change: Learning Invariant and Minimum Sufficient Representations for Fine-Grained Visual Categorization
Fine-grained visual categorization (FGVC) is a challenging task due to similar visual appearances between various species.
SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization
To address the above limitations, we propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning to contain both the appearance information and structure information.
Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset
We thoroughly benchmark audiovisual classification performance and modality fusion experiments through the use of state-of-the-art transformer methods.
ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder
How- ever, the complexity of the model makes it difficult to interpret the decision-making process, and the ambiguity of the attention maps can cause incorrect correlations between image patches.
On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition
Inspired by this observation, we propose a network branch dedicated to magnifying the importance of small eigenvalues.
High-Order-Interaction for weakly supervised Fine-Grained Visual Categorization
Of those, methods based on bilinear pooling are one of the main categories for computing the interaction between deep features and have shown high effectiveness.
Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification
Unlike most existing methods that learn visual attention based on conventional likelihood, we propose to learn the attention with counterfactual causality, which provides a tool to measure the attention quality and a powerful supervisory signal to guide the learning process.
Feature Fusion Vision Transformer for Fine-Grained Visual Categorization
We verify the effectiveness of FFVT on three benchmarks where FFVT achieves the state-of-the-art performance.
Self-Supervised Learning for Fine-Grained Visual Categorization
The deconstruction learning forces the model to focus on local object parts, while reconstruction learning helps in learning the correlation between the parts.
Benchmarking Representation Learning for Natural World Image Collections
In order to facilitate progress in this area we present two new natural world visual classification datasets, iNat2021 and NeWT.