TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Long-tail Learning	ImageNet-LT	VL-LTR (ViT-B-16)	Top-1 Accuracy	77.2	# 4
Long-tail Learning	ImageNet-LT	VL-LTR (ResNet-50)	Top-1 Accuracy	70.1	# 8
Long-tail Learning	iNaturalist 2018	VL-LTR (ResNet-50)	Top-1 Accuracy	74.6%	# 14
Image Classification	iNaturalist 2018	VL-LTR (ViT-B-16)	Top-1 Accuracy	81.0%	# 12
Image Classification	iNaturalist 2018	VL-LTR (ResNet-50)	Top-1 Accuracy	74.6%	# 24
Long-tail Learning	iNaturalist 2018	VL-LTR (ViT-B-16)	Top-1 Accuracy	81.0%	# 2
Long-tail Learning	Places-LT	VL-LTR (ResNet-50)	Top-1 Accuracy	48.0	# 6
Long-tail Learning	Places-LT	VL-LTR (ViT-B-16)	Top-1 Accuracy	50.1	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vl-ltr-learning-class-wise-visual-linguistic/long-tail-learning-on-inaturalist-2018)](https://paperswithcode.com/sota/long-tail-learning-on-inaturalist-2018?p=vl-ltr-learning-class-wise-visual-linguistic)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vl-ltr-learning-class-wise-visual-linguistic/long-tail-learning-on-places-lt)](https://paperswithcode.com/sota/long-tail-learning-on-places-lt?p=vl-ltr-learning-class-wise-visual-linguistic)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vl-ltr-learning-class-wise-visual-linguistic/long-tail-learning-on-imagenet-lt)](https://paperswithcode.com/sota/long-tail-learning-on-imagenet-lt?p=vl-ltr-learning-class-wise-visual-linguistic)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/vl-ltr-learning-class-wise-visual-linguistic/image-classification-on-inaturalist-2018)](https://paperswithcode.com/sota/image-classification-on-inaturalist-2018?p=vl-ltr-learning-class-wise-visual-linguistic)`

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

26 Nov 2021 · Changyao Tian, Wenhai Wang, Xizhou Zhu, Jifeng Dai, Yu Qiao ·

Deep learning-based models encounter challenges when processing long-tailed data in the real world. Existing solutions usually employ some balancing strategies or transfer learning to deal with the class imbalance problem, based on the image modality. In this work, we present a visual-linguistic long-tailed recognition framework, termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition (LTR). Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image samples. We also conduct extensive experiments and set the new state-of-the-art performance on widely-used LTR benchmarks. Notably, our method achieves 77.2% overall accuracy on ImageNet-LT, which significantly outperforms the previous best method by over 17 points, and is close to the prevailing performance training on the full ImageNet. Code is available at https://github.com/ChangyaoTian/VL-LTR.

PDF Abstract

Code

Add Remove Mark official

ChangyaoTian/VL-LTR official

Tasks

Add Remove

Image Classification

Long-tail Learning

Transfer Learning

Datasets

ImageNet

Places

iNaturalist ImageNet-LT Places-LT

Results from the Paper

Edit

Ranked #2 on Long-tail Learning on iNaturalist 2018 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Long-tail Learning	ImageNet-LT	VL-LTR (ViT-B-16)	Top-1 Accuracy	77.2	# 4	Compare
Long-tail Learning	ImageNet-LT	VL-LTR (ResNet-50)	Top-1 Accuracy	70.1	# 8	Compare
Long-tail Learning	iNaturalist 2018	VL-LTR (ResNet-50)	Top-1 Accuracy	74.6%	# 14	Compare
Image Classification	iNaturalist 2018	VL-LTR (ViT-B-16)	Top-1 Accuracy	81.0%	# 12	Compare
Image Classification	iNaturalist 2018	VL-LTR (ResNet-50)	Top-1 Accuracy	74.6%	# 24	Compare
Long-tail Learning	iNaturalist 2018	VL-LTR (ViT-B-16)	Top-1 Accuracy	81.0%	# 2	Compare
Long-tail Learning	Places-LT	VL-LTR (ResNet-50)	Top-1 Accuracy	48.0	# 6	Compare
Long-tail Learning	Places-LT	VL-LTR (ViT-B-16)	Top-1 Accuracy	50.1	# 3	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove