Mitigating Embedding and Class Assignment Mismatch in Unsupervised Image Classification
Unsupervised image classification is a challenging computer vision task. Deep learning-based algorithms have achieved superb results, where the latest approach adopts unified losses from embedding and class assignment processes. Since these processes inherently have different goals, jointly optimizing them may lead to a suboptimal solution. To address this limitation, we propose a novel two-stage algorithm in which an embedding module for pretraining precedes a refining module that concurrently performs embedding and class assignment. Our model outperforms SOTA when tested with multiple datasets, by substantially high accuracy of 81.0% for the CIFAR-10 dataset (i.e., increased by 19.3 percent points), 35.3% accuracy for CIFAR-100-20 (9.6 pp) and 66.5% accuracy for STL-10 (6.9 pp) in unsupervised tasks.
PDF AbstractCode
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
Unsupervised Image Classification | CIFAR-10 | TSUC | Accuracy | 81.0 | # 7 | ||
Image Clustering | CIFAR-10 | TSUC | Accuracy | 0.81 | # 21 | ||
NMI | - | # 29 | |||||
Train set | Train | # 1 | |||||
ARI | - | # 29 | |||||
Backbone | ResNet-18 | # 1 | |||||
Image Clustering | CIFAR-100 | TSUC | Accuracy | 0.353 | # 19 | ||
Unsupervised Image Classification | CIFAR-20 | TSUC | Accuracy | 35.3 | # 9 | ||
Unsupervised Image Classification | STL-10 | TSUC | Accuracy | 66.50 | # 7 | ||
Image Clustering | STL-10 | TSUC | Accuracy | 0.665 | # 19 | ||
Backbone | ResNet-18 | # 1 |