Language-Guided Image Clustering

29 Sep 2021  ·  Niv Cohen, Yedid Hoshen ·

Image clustering methods have rapidly improved their ability to discover object categories. However, unsupervised clustering methods struggle on other image attributes, e.g. age or activity. The reason is that most recent clustering methods learn deep features that are designed to be sensitive to object category, but less so to other image attributes. We propose to overcome this limitation by introducing the new setting of language-guided image clustering. In this setting, the model is provided with an exhaustive list of phrases describing all the possible values of a specific attribute, together with a shared image-language embedding (e.g. CLIP). Our method then computes the subset of K attribute phrases that form the best clustering of the images. Differently from standard clustering methods, our method can cluster according to image attributes other than the object category. We evaluate our method on a attribute clustering tasks and demonstrate that our method significantly outperforms methods that do not use language-guidance.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here