Twin Contrastive Learning for Online Clustering

21 Oct 2022  ·  Yunfan Li, Mouxing Yang, Dezhong Peng, Taihao Li, Jiantao Huang, Xi Peng ·

This paper proposes to perform online clustering by conducting twin contrastive learning (TCL) at the instance and cluster level. Specifically, we find that when the data is projected into a feature space with a dimensionality of the target cluster number, the rows and columns of its feature matrix correspond to the instance and cluster representation, respectively. Based on the observation, for a given dataset, the proposed TCL first constructs positive and negative pairs through data augmentations. Thereafter, in the row and column space of the feature matrix, instance- and cluster-level contrastive learning are respectively conducted by pulling together positive pairs while pushing apart the negatives. To alleviate the influence of intrinsic false-negative pairs and rectify cluster assignments, we adopt a confidence-based criterion to select pseudo-labels for boosting both the instance- and cluster-level contrastive learning. As a result, the clustering performance is further improved. Besides the elegant idea of twin contrastive learning, another advantage of TCL is that it could independently predict the cluster assignment for each instance, thus effortlessly fitting online scenarios. Extensive experiments on six widely-used image and text benchmarks demonstrate the effectiveness of TCL. The code will be released on GitHub.

PDF Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Short Text Clustering Biomedical TCL Acc 49.8 # 1
NMI 42.9 # 1
Image Clustering CIFAR-10 TCL Accuracy 0.887 # 7
NMI 0.819 # 5
Train set Train # 1
ARI 0.780 # 7
Backbone ResNet-34 # 1
Image Clustering CIFAR-100 TCL Accuracy 0.531 # 5
NMI 0.529 # 4
Train Set Train # 1
ARI 0.357 # 6
Image Clustering ImageNet-10 TCL Accuracy 0.895 # 8
NMI 0.875 # 8
ARI 0.837 # 8
Image Clustering Imagenet-dog-15 TCL Accuracy 0.644 # 7
NMI 0.623 # 7
ARI 0.516 # 7
Short Text Clustering Stackoverflow TCL Acc 88.2 # 1
NMI 0.786 # 1
Image Clustering STL-10 TCL Accuracy 0.868 # 4
NMI 0.799 # 3
Train Split Train # 1
ARI 0.757 # 3
Backbone ResNet-34 # 1