Learning Representations by Contrasting Clusters While Bootstrapping Instances

1 Jan 2021 · Junsoo Lee, Hojoon Lee, Inkyu Shin, Jaekyoung Bae, In So Kweon, Jaegul Choo ·

Learning visual representations using large-scale unlabelled images is a holy grail for most of computer vision tasks. Recent contrastive learning methods have focused on encouraging the learned visual representations to be linearly separable among the individual items regardless of their semantic similarity; however, it could lead to a sub-optimal solution if a given downstream task is related to non-discriminative ones such as cluster analysis and information retrieval. In this work, we propose an advanced approach to consider the instance semantics in an unsupervised environment by both i) Contrasting batch-wise Cluster assignment features and ii) Bootstrapping an INstance representations without considering negatives simultaneously, referred to as C2BIN. Specifically, instances in a mini-batch are appropriately assigned to distinct clusters, each of which aims to capture apparent similarity among instances. Moreover, we introduce a pyramidal multi-heads technique, showing positive effects on the representations by capturing multi-scale semantics. Empirically, our method achieves comparable or better performance than both representation learning and clustering baselines on various benchmark datasets: CIFAR-10, CIFAR-100, and STL-10.

PDF Abstract