TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Long-tail Learning	ImageNet-LT	IEM	Top-1 Accuracy	43.2	# 55
Long-tail Learning	iNaturalist 2018	IEM	Top-1 Accuracy	70.2	# 28
Long-tail Learning	Places-LT	IEM	Top-1 Accuracy	39.7	# 16

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/inflated-episodic-memory-with-region-self/long-tail-learning-on-places-lt)](https://paperswithcode.com/sota/long-tail-learning-on-places-lt?p=inflated-episodic-memory-with-region-self)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/inflated-episodic-memory-with-region-self/long-tail-learning-on-inaturalist-2018)](https://paperswithcode.com/sota/long-tail-learning-on-inaturalist-2018?p=inflated-episodic-memory-with-region-self)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/inflated-episodic-memory-with-region-self/long-tail-learning-on-imagenet-lt)](https://paperswithcode.com/sota/long-tail-learning-on-imagenet-lt?p=inflated-episodic-memory-with-region-self)`

Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition

CVPR 2020 · Linchao Zhu, Yi Yang ·

There have been increasing interests in modeling long-tailed data. Unlike artificially collected datasets, long-tailed data are naturally existed in the real-world and thus more realistic. To deal with the class imbalance problem, we introduce an Inflated Episodic Memory (IEM) for long-tailed visual recognition. First, our IEM augments the convolutional neural networks with categorical representative features for rapid learning on tail classes. In traditional few-shot learning, a single prototype is usually leveraged to represent a category. However, long-tailed data has higher intra-class variances. It could be challenging to learn a single prototype for one category. Thus, we introduce IEM to store the most discriminative feature for each category individually. Besides, the memory banks are updated independently, which further decreases the chance of learning skewed classifiers. Second, we introduce a novel region self-attention mechanism for multi-scale spatial feature map encoding. It is beneficial to incorporate more discriminative features to improve generalization on tail classes. We propose to encode local feature maps at multiple scales, and the spatial contextual information should be aggregated at the same time. Equipped with IEM and region self-attention, we achieve state-of-the-art performance on four standard long-tailed image recognition benchmarks. Besides, we validate the effectiveness of IEM on a long-tailed video recognition benchmark, i.e., YouTube-8M.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Few-Shot Learning

Long-tail Learning

Video Recognition

Datasets

Kinetics

iNaturalist ImageNet-LT Places-LT

Results from the Paper

Add Remove

Ranked #16 on Long-tail Learning on Places-LT

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Long-tail Learning	ImageNet-LT	IEM	Top-1 Accuracy	43.2	# 55	Compare
Long-tail Learning	iNaturalist 2018	IEM	Top-1 Accuracy	70.2	# 28	Compare
Long-tail Learning	Places-LT	IEM	Top-1 Accuracy	39.7	# 16	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove