TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	Model soups (ViT-G/14)	Top 1 Accuracy	90.94%	# 5
Image Classification	ImageNet	Model soups (ViT-G/14)	Number of params	1843M	# 962
Image Classification	ImageNet	Model soups (BASIC-L)	Top 1 Accuracy	90.98%	# 4
Image Classification	ImageNet	Model soups (BASIC-L)	Number of params	2440M	# 969
Domain Generalization	ImageNet-A	Model soups (BASIC-L)	Top-1 accuracy %	94.17	# 1
Domain Generalization	ImageNet-A	Model soups (ViT-G/14)	Top-1 accuracy %	92.67	# 2
Unsupervised Domain Adaptation	ImageNet-R	Model soups (ViT-G/14)	Top 1 Error	4.54	# 1
Domain Generalization	ImageNet-R	Model soups (BASIC-L)	Top-1 Error Rate	3.90	# 1
Domain Generalization	ImageNet-R	Model soups (ViT-G/14)	Top-1 Error Rate	4.54	# 2
Image Classification	ImageNet ReaL	Model soups (ViT-G/14)	Accuracy	91.20%	# 2
Image Classification	ImageNet ReaL	Model soups (ViT-G/14)	Params	1843M	# 55
Image Classification	ImageNet ReaL	Model soups (BASIC-L)	Accuracy	91.03%	# 7
Image Classification	ImageNet ReaL	Model soups (BASIC-L)	Params	2440M	# 56
Image Classification	ImageNet ReaL	Baseline (ViT-G/14)	Accuracy	91.78%	# 1
Domain Generalization	ImageNet-Sketch	Model soups (ViT-G/14)	Top-1 accuracy	74.24	# 2
Domain Generalization	ImageNet-Sketch	Model soups (BASIC-L)	Top-1 accuracy	77.18	# 1
Image Classification	ImageNet V2	Model soups (ViT-G/14)	Top 1 Accuracy	84.22	# 3
Image Classification	ImageNet V2	Model soups (BASIC-L)	Top 1 Accuracy	84.63	# 1
Out-of-Distribution Generalization	ImageNet-W	Uniform Soup (ViT-B/32)	IN-W Gap	-7.9	# 1
Out-of-Distribution Generalization	ImageNet-W	Uniform Soup (ViT-B/32)	Carton Gap	+24	# 1
Out-of-Distribution Generalization	ImageNet-W	Greedy Soup (ViT-B/32)	IN-W Gap	-6.5	# 1
Out-of-Distribution Generalization	ImageNet-W	Greedy Soup (ViT-B/32)	Carton Gap	+16	# 1
Image Classification	ObjectNet	Baseline (ViT-G/14)	Top-1 Accuracy	79.03	# 5
Image Classification	ObjectNet	Model soups (ViT-G/14)	Top-1 Accuracy	78.52	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/model-soups-averaging-weights-of-multiple/domain-generalization-on-imagenet-a)](https://paperswithcode.com/sota/domain-generalization-on-imagenet-a?p=model-soups-averaging-weights-of-multiple)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/model-soups-averaging-weights-of-multiple/unsupervised-domain-adaptation-on-imagenet-r)](https://paperswithcode.com/sota/unsupervised-domain-adaptation-on-imagenet-r?p=model-soups-averaging-weights-of-multiple)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/model-soups-averaging-weights-of-multiple/domain-generalization-on-imagenet-r)](https://paperswithcode.com/sota/domain-generalization-on-imagenet-r?p=model-soups-averaging-weights-of-multiple)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/model-soups-averaging-weights-of-multiple/image-classification-on-imagenet-real)](https://paperswithcode.com/sota/image-classification-on-imagenet-real?p=model-soups-averaging-weights-of-multiple)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/model-soups-averaging-weights-of-multiple/domain-generalization-on-imagenet-sketch)](https://paperswithcode.com/sota/domain-generalization-on-imagenet-sketch?p=model-soups-averaging-weights-of-multiple)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/model-soups-averaging-weights-of-multiple/image-classification-on-imagenet-v2)](https://paperswithcode.com/sota/image-classification-on-imagenet-v2?p=model-soups-averaging-weights-of-multiple)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/model-soups-averaging-weights-of-multiple/out-of-distribution-generalization-on-1)](https://paperswithcode.com/sota/out-of-distribution-generalization-on-1?p=model-soups-averaging-weights-of-multiple)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/model-soups-averaging-weights-of-multiple/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=model-soups-averaging-weights-of-multiple)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/model-soups-averaging-weights-of-multiple/image-classification-on-objectnet)](https://paperswithcode.com/sota/image-classification-on-objectnet?p=model-soups-averaging-weights-of-multiple)`

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

10 Mar 2022 · Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S. Morcos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, Ludwig Schmidt ·

The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs -- we call the results "model soups." When fine-tuning large pre-trained models such as CLIP, ALIGN, and a ViT-G pre-trained on JFT, our soup recipe provides significant improvements over the best model in a hyperparameter sweep on ImageNet. The resulting ViT-G model, which attains 90.94% top-1 accuracy on ImageNet, achieved a new state of the art. Furthermore, we show that the model soup approach extends to multiple image classification and natural language processing tasks, improves out-of-distribution performance, and improves zero-shot performance on new downstream tasks. Finally, we analytically relate the performance similarity of weight-averaging and logit-ensembling to flatness of the loss and confidence of the predictions, and validate this relation empirically. Code is available at https://github.com/mlfoundations/model-soups.

PDF Abstract

Code

Add Remove Mark official

mlfoundations/model-soups official

↳ Quickstart in

Colab

367

Burf/ModelSoups

hwk0702/keras2torch

facebookresearch/ModelRatatouille

shallowlearn/sportsreid

Tasks

Add Remove

Domain Generalization

Image Classification

Out-of-Distribution Generalization

Unsupervised Domain Adaptation

Datasets

ImageNet

GLUE

SST SST-2

MRPC

CoLA

ImageNet-R

ImageNet-A

ImageNet-Sketch

Wilds

ObjectNet JFT-3B

ImageNet-W

Results from the Paper

Edit

Ranked #1 on Image Classification on ImageNet V2 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	Model soups (ViT-G/14)	Top 1 Accuracy	90.94%	# 5	Compare
Image Classification	ImageNet	Model soups (ViT-G/14)	Number of params	1843M	# 962	Compare
Image Classification	ImageNet	Model soups (BASIC-L)	Top 1 Accuracy	90.98%	# 4	Compare
Image Classification	ImageNet	Model soups (BASIC-L)	Number of params	2440M	# 969	Compare
Domain Generalization	ImageNet-A	Model soups (BASIC-L)	Top-1 accuracy %	94.17	# 1	Compare
Domain Generalization	ImageNet-A	Model soups (ViT-G/14)	Top-1 accuracy %	92.67	# 2	Compare
Unsupervised Domain Adaptation	ImageNet-R	Model soups (ViT-G/14)	Top 1 Error	4.54	# 1	Compare
Domain Generalization	ImageNet-R	Model soups (BASIC-L)	Top-1 Error Rate	3.90	# 1	Compare
Domain Generalization	ImageNet-R	Model soups (ViT-G/14)	Top-1 Error Rate	4.54	# 2	Compare
Image Classification	ImageNet ReaL	Model soups (ViT-G/14)	Accuracy	91.20%	# 2	Compare
Image Classification	ImageNet ReaL	Model soups (ViT-G/14)	Params	1843M	# 55	Compare
Image Classification	ImageNet ReaL	Model soups (BASIC-L)	Accuracy	91.03%	# 7	Compare
Image Classification	ImageNet ReaL	Model soups (BASIC-L)	Params	2440M	# 56	Compare
Image Classification	ImageNet ReaL	Baseline (ViT-G/14)	Accuracy	91.78%	# 1	Compare
Domain Generalization	ImageNet-Sketch	Model soups (ViT-G/14)	Top-1 accuracy	74.24	# 2	Compare
Domain Generalization	ImageNet-Sketch	Model soups (BASIC-L)	Top-1 accuracy	77.18	# 1	Compare
Image Classification	ImageNet V2	Model soups (ViT-G/14)	Top 1 Accuracy	84.22	# 3	Compare
Image Classification	ImageNet V2	Model soups (BASIC-L)	Top 1 Accuracy	84.63	# 1	Compare
Out-of-Distribution Generalization	ImageNet-W	Uniform Soup (ViT-B/32)	IN-W Gap	-7.9	# 1	Compare
Out-of-Distribution Generalization	ImageNet-W	Uniform Soup (ViT-B/32)	Carton Gap	+24	# 1	Compare
Out-of-Distribution Generalization	ImageNet-W	Greedy Soup (ViT-B/32)	IN-W Gap	-6.5	# 1	Compare
Out-of-Distribution Generalization	ImageNet-W	Greedy Soup (ViT-B/32)	Carton Gap	+16	# 1	Compare
Image Classification	ObjectNet	Baseline (ViT-G/14)	Top-1 Accuracy	79.03	# 5	Compare
Image Classification	ObjectNet	Model soups (ViT-G/14)	Top-1 Accuracy	78.52	# 6	Compare

Methods

Add Remove

ALIGN • CLIP • Soups

Edit Social Preview

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove