Information Maximization Clustering via Multi-View Self-Labelling

12 Mar 2021  ·  Foivos Ntelemis, Yaochu Jin, Spencer A. Thomas ·

Image clustering is a particularly challenging computer vision task, which aims to generate annotations without human supervision. Recent advances focus on the use of self-supervised learning strategies in image clustering, by first learning valuable semantics and then clustering the image representations. These multiple-phase algorithms, however, increase the computational time and their final performance is reliant on the first stage. By extending the self-supervised approach, we propose a novel single-phase clustering method that simultaneously learns meaningful representations and assigns the corresponding annotations. This is achieved by integrating a discrete representation into the self-supervised paradigm through a classifier net. Specifically, the proposed clustering objective employs mutual information, and maximizes the dependency between the integrated discrete representation and a discrete probability distribution. The discrete probability distribution is derived though the self-supervised process by comparing the learnt latent representation with a set of trainable prototypes. To enhance the learning performance of the classifier, we jointly apply the mutual information across multi-crop views. Our empirical results show that the proposed framework outperforms state-of-the-art techniques with the average accuracy of 89.1% and 49.0%, respectively, on CIFAR-10 and CIFAR-100/20 datasets. Finally, the proposed method also demonstrates attractive robustness to parameter settings, making it ready to be applicable to other datasets.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Clustering CIFAR-10 IMC-SwAV (Best) Accuracy 0.897 # 5
NMI 0.818 # 6
Train set Train # 1
ARI 0.8 # 5
Backbone ResNet-18 # 1
Image Clustering CIFAR-10 IMC-SwAV (Avg+-) Accuracy 0.891 # 6
NMI 0.811 # 7
Train set Train # 1
ARI 0.79 # 6
Backbone ResNet-18 # 1
Image Clustering CIFAR-100 IMC-SwAV (Avg+-) Accuracy 0.49 # 8
NMI 0.503 # 6
ARI 0.337 # 7
Image Clustering CIFAR-100 IMC-SwAV (Best) Accuracy 0.519 # 6
NMI 0.527 # 5
Train Set Train # 1
ARI 0.361 # 5
Image Clustering STL-10 IMC-SwAV (Avg+-) Accuracy 0.831 # 9
NMI 0.729 # 7
Train Split Train # 1
ARI 0.685 # 6
Backbone ResNet-18 # 1
Image Clustering STL-10 IMC-SwAV (Best) Accuracy 0.853 # 6
NMI 0.747 # 5
Train Split Train # 1
ARI 0.716 # 4
Backbone ResNet-18 # 1
Image Clustering Tiny-ImageNet IMC-SwAV (Best) Accuracy 0.282 # 2
NMI 0.526 # 1
ARI 0.146 # 2
Image Clustering Tiny-ImageNet IMC-SwAV (Avg+-) Accuracy 0.279 # 3
NMI 0.485 # 2
ARI 0.143 # 3

Methods


No methods listed for this paper. Add relevant methods here