ImageNet-Atr (ImageNet with Adversarial Text Regions)

Introduced by Cao et al. in Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness

We build a new evaluation set by adding spotting words to the images of ImageNet 2012 evaluation sets. There are 1,000 categories in ImageNet. For each category c, we find its most confusing category c*and spot the category name to every evaluation image.

This evaluation set is challenging for many CLIP models. For example, OpenAI CLIP B-16 got a top-1 accuracy of as low as 32%, which is much lower than the original ImageNet evaluation set.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Usage

License

Unknown

Modalities

Images

Languages

English

ImageNet-Atr (ImageNet with Adversarial Text Regions)

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit