TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Long-tail Learning	CIFAR-100-LT (ρ=100)	BALLAD (ViT-B/16)	Error Rate	22.2	# 5
Long-tail Learning	ImageNet-LT	BALLAD(ResNet-50)	Top-1 Accuracy	67.2	# 9
Long-tail Learning	ImageNet-LT	BALLAD(ResNet-50×16)	Top-1 Accuracy	76.5	# 5
Long-tail Learning	ImageNet-LT	BALLAD(ResNet-101)	Top-1 Accuracy	70.5	# 7
Long-tail Learning	ImageNet-LT	BALLAD(ViT-B-16)	Top-1 Accuracy	75.7	# 6
Long-tail Learning	Places-LT	BALLAD(ResNet-50)	Top-1 Accuracy	46.5	# 9
Long-tail Learning	Places-LT	BALLAD(ResNet-101)	Top-1 Accuracy	47.9	# 7
Long-tail Learning	Places-LT	BALLAD(ResNet-50×16)	Top-1 Accuracy	49.3	# 5
Long-tail Learning	Places-LT	BALLAD(ViT-B-16)	Top-1 Accuracy	49.5	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-simple-long-tailed-recognition-baseline-via/long-tail-learning-on-places-lt)](https://paperswithcode.com/sota/long-tail-learning-on-places-lt?p=a-simple-long-tailed-recognition-baseline-via)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-simple-long-tailed-recognition-baseline-via/long-tail-learning-on-cifar-100-lt-r-100)](https://paperswithcode.com/sota/long-tail-learning-on-cifar-100-lt-r-100?p=a-simple-long-tailed-recognition-baseline-via)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-simple-long-tailed-recognition-baseline-via/long-tail-learning-on-imagenet-lt)](https://paperswithcode.com/sota/long-tail-learning-on-imagenet-lt?p=a-simple-long-tailed-recognition-baseline-via)`

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

29 Nov 2021 · Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao ·

The visual world naturally exhibits a long-tailed distribution of open classes, which poses great challenges to modern visual systems. Existing approaches either perform class re-balancing strategies or directly improve network modules to address the problem. However, they still train models with a finite set of predefined labels, limiting their supervision information and restricting their transferability to novel instances. Recent advances in large-scale contrastive visual-language pretraining shed light on a new pathway for visual recognition. With open-vocabulary supervisions, pretrained contrastive vision-language models learn powerful multimodal representations that are promising to handle data deficiency and unseen concepts. By calculating the semantic similarity between visual and text inputs, visual recognition is converted to a vision-language matching problem. Inspired by this, we propose BALLAD to leverage contrastive vision-language models for long-tailed recognition. We first continue pretraining the vision-language backbone through contrastive learning on a specific long-tailed target dataset. Afterward, we freeze the backbone and further employ an additional adapter layer to enhance the representations of tail classes on balanced training samples built with re-sampling strategies. Extensive experiments have been conducted on three popular long-tailed recognition benchmarks. As a result, our simple and effective approach sets the new state-of-the-art performances and outperforms competitive baselines with a large margin. Code is released at https://github.com/gaopengcuhk/BALLAD.

PDF Abstract

Code

Add Remove Mark official

gaopengcuhk/ballad official

Tasks

Add Remove

Contrastive Learning

Language Modelling

Long-tail Learning

Semantic Similarity

Semantic Textual Similarity

Datasets

ImageNet

CIFAR-100

Places ImageNet-LT Places-LT

Results from the Paper

Edit

Ranked #4 on Long-tail Learning on Places-LT (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Long-tail Learning	CIFAR-100-LT (ρ=100)	BALLAD (ViT-B/16)	Error Rate	22.2	# 5	Compare
Long-tail Learning	ImageNet-LT	BALLAD(ResNet-50)	Top-1 Accuracy	67.2	# 9	Compare
Long-tail Learning	ImageNet-LT	BALLAD(ResNet-50×16)	Top-1 Accuracy	76.5	# 5	Compare
Long-tail Learning	ImageNet-LT	BALLAD(ResNet-101)	Top-1 Accuracy	70.5	# 7	Compare
Long-tail Learning	ImageNet-LT	BALLAD(ViT-B-16)	Top-1 Accuracy	75.7	# 6	Compare
Long-tail Learning	Places-LT	BALLAD(ResNet-50)	Top-1 Accuracy	46.5	# 9	Compare
Long-tail Learning	Places-LT	BALLAD(ResNet-101)	Top-1 Accuracy	47.9	# 7	Compare
Long-tail Learning	Places-LT	BALLAD(ResNet-50×16)	Top-1 Accuracy	49.3	# 5	Compare
Long-tail Learning	Places-LT	BALLAD(ViT-B-16)	Top-1 Accuracy	49.5	# 4	Compare

Methods

Add Remove

Adapter • Contrastive Learning

Edit Social Preview

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove