TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Self-Supervised Image Classification	ImageNet	iBOT-vMF (ViT-B/16)	Top 1 Accuracy	80.3%	# 24
Self-Supervised Image Classification	ImageNet	iBOT-vMF (ViT-B/16)	Number of Params	85M	# 38
Self-Supervised Image Classification	ImageNet	DINO-vMF (ViT-B/16)	Top 1 Accuracy	78.8%	# 40
Self-Supervised Image Classification	ImageNet	DINO-vMF (ViT-B/16)	Number of Params	85M	# 38
Self-Supervised Image Classification	ImageNet	DINO-vMF (ViT-S/16)	Top 1 Accuracy	77.0%	# 53
Self-Supervised Image Classification	ImageNet	DINO-vMF (ViT-S/16)	Number of Params	21M	# 77

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dino-as-a-von-mises-fisher-mixture-model/self-supervised-image-classification-on)](https://paperswithcode.com/sota/self-supervised-image-classification-on?p=dino-as-a-von-mises-fisher-mixture-model)`

DINO as a von Mises-Fisher mixture model

ICLR 2023 · Hariprasath Govindarajan, Per Sidén, Jacob Roll, Fredrik Lindsten ·

Self-distillation methods using Siamese networks are popular for self-supervised pre-training. DINO is one such method based on a cross-entropy loss between $K$-dimensional probability vectors, obtained by applying a softmax function to the dot product between representations and learnt prototypes. Given the fact that the learned representations are $L^2$-normalized, we show that DINO and its derivatives, such as iBOT, can be interpreted as a mixture model of von Mises-Fisher components. With this interpretation, DINO assumes equal precision for all components when the prototypes are also $L^2$-normalized. Using this insight we propose DINO-vMF, that adds appropriate normalization constants when computing the cluster assignment probabilities. Unlike DINO, DINO-vMF is stable also for the larger ViT-Base model with unnormalized prototypes. We show that the added flexibility of the mixture model is beneficial in terms of better image representations. The DINO-vMF pre-trained model consistently performs better than DINO on a range of downstream tasks. We obtain similar improvements for iBOT-vMF vs iBOT and thereby show the relevance of our proposed modification also for other methods derived from DINO.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Self-Supervised Image Classification

Self-Supervised Learning

Transfer Learning

Datasets

CIFAR-10

ImageNet

CIFAR-100

Oxford 102 Flower

DTD

Food-101

Caltech-101

FGVC-Aircraft

DAVIS 2017

Oxford5k Oxford-IIIT Pets

SUN397 Paris6k

Results from the Paper

Add Remove

Ranked #24 on Self-Supervised Image Classification on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Self-Supervised Image Classification	ImageNet	iBOT-vMF (ViT-B/16)	Top 1 Accuracy	80.3%	# 24	Compare
Self-Supervised Image Classification	ImageNet	iBOT-vMF (ViT-B/16)	Number of Params	85M	# 38	Compare
Self-Supervised Image Classification	ImageNet	DINO-vMF (ViT-B/16)	Top 1 Accuracy	78.8%	# 40	Compare
Self-Supervised Image Classification	ImageNet	DINO-vMF (ViT-B/16)	Number of Params	85M	# 38	Compare
Self-Supervised Image Classification	ImageNet	DINO-vMF (ViT-S/16)	Top 1 Accuracy	77.0%	# 53	Compare
Self-Supervised Image Classification	ImageNet	DINO-vMF (ViT-S/16)	Number of Params	21M	# 77	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

DINO as a von Mises-Fisher mixture model

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove