Deep Clustering for Unsupervised Learning of Visual Features

Clustering is a class of unsupervised learning methods that has been extensively applied and studied in computer vision. Little work has been done to adapt it to the end-to-end training of visual features on large scale datasets. In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features. DeepCluster iteratively groups the features with a standard clustering algorithm, k-means, and uses the subsequent assignments as supervision to update the weights of the network. We apply DeepCluster to the unsupervised training of convolutional neural networks on large datasets like ImageNet and YFCC100M. The resulting model outperforms the current state of the art by a significant margin on all the standard benchmarks.

PDF Abstract ECCV 2018 PDF ECCV 2018 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Clustering CIFAR-100 DeeperCluster Accuracy 0.189 # 20
Train Set Train+Test # 1
Unsupervised Semantic Segmentation Cityscapes test MDC mIoU 7.1 # 12
Accuracy 40.7 # 12
Unsupervised Semantic Segmentation ImageNet-S-50 MDC (Supervised pretrain) mIoU (val) 14.6 # 5
mIoU (test) 14.3 # 5

Results from Other Papers

Task Dataset Model Metric Name Metric Value Rank Uses Extra
Training Data
Source Paper Compare
Image Clustering CIFAR-10 DeepCluster Accuracy 0.374 # 25
NMI - # 27
Train set Train+Test # 1
ARI - # 27
Backbone ResNet-34 # 1