TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Fine-Grained Image Classification	CUB-200-2011	S3N	Accuracy	88.5%	# 43
Fine-Grained Image Classification	FGVC Aircraft	S3N	Accuracy	92.8%	# 32
Fine-Grained Image Classification	Stanford Cars	S3N	Accuracy	94.7%	# 30

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/selective-sparse-sampling-for-fine-grained/fine-grained-image-classification-on-stanford)](https://paperswithcode.com/sota/fine-grained-image-classification-on-stanford?p=selective-sparse-sampling-for-fine-grained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/selective-sparse-sampling-for-fine-grained/fine-grained-image-classification-on-fgvc)](https://paperswithcode.com/sota/fine-grained-image-classification-on-fgvc?p=selective-sparse-sampling-for-fine-grained)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/selective-sparse-sampling-for-fine-grained/fine-grained-image-classification-on-cub-200)](https://paperswithcode.com/sota/fine-grained-image-classification-on-cub-200?p=selective-sparse-sampling-for-fine-grained)`

Selective Sparse Sampling for Fine-Grained Image Recognition

ICCV 2019 · Yao Ding, Yanzhao Zhou, Yi Zhu, Qixiang Ye, Jianbin Jiao ·

Fine-grained recognition poses the unique challenge of capturing subtle inter-class differences under considerable intra-class variances (e.g., beaks for bird species). Conventional approaches crop local regions and learn detailed representation from those regions, but suffer from the fixed number of parts and missing of surrounding context. In this paper, we propose a simple yet effective framework, called Selective Sparse Sampling, to capture diverse and fine-grained details. The framework is implemented using Convolutional Neural Networks, referred to as Selective Sparse Sampling Networks (S3Ns). With image-level supervision, S3Ns collect peaks, i.e., local maximums, from class response maps to estimate informative, receptive fields and learn a set of sparse attention for capturing fine-detailed visual evidence as well as preserving context. The evidence is selectively sampled to extract discriminative and complementary features, which significantly enrich the learned representation and guide the network to discover more subtle cues. Extensive experiments and ablation studies show that the proposed method consistently outperforms the state-of-the-art methods on challenging benchmarks including CUB-200-2011, FGVC-Aircraft, and Stanford Cars.

PDF Abstract