Dataset Summarization by K Principal Concepts

8 Apr 2021  ·  Niv Cohen, Yedid Hoshen ·

We propose the new task of K principal concept identification for dataset summarizarion. The objective is to find a set of K concepts that best explain the variation within the dataset. Concepts are high-level human interpretable terms such as "tiger", "kayaking" or "happy". The K concepts are selected from a (potentially long) input list of candidates, which we denote the concept-bank. The concept-bank may be taken from a generic dictionary or constructed by task-specific prior knowledge. An image-language embedding method (e.g. CLIP) is used to map the images and the concept-bank into a shared feature space. To select the K concepts that best explain the data, we formulate our problem as a K-uncapacitated facility location problem. An efficient optimization technique is used to scale the local search algorithm to very large concept-banks. The output of our method is a set of K principal concepts that summarize the dataset. Our approach provides a more explicit summary in comparison to selecting K representative images, which are often ambiguous. As a further application of our method, the K principal concepts can be used to classify the dataset into K groups. Extensive experiments demonstrate the efficacy of our approach.

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper


Ranked #4 on Image Clustering on ImageNet-100 (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Clustering CIFAR-10 Single-Noun Prior Accuracy 0.853 # 12
NMI 0.731 # 14
Train set Train+Test # 1
ARI 0.702 # 15
Backbone ViT-B-32 # 1
Image Clustering ImageNet-100 Single-Noun Prior NMI 0.805 # 4
ACCURACY 0.731 # 4
ARI 0.628 # 4
Image Clustering ImageNet-200 Single-Noun Prior NMI 0.749 # 5
ACCURACY 0.598 # 3
ARI 0.486 # 4
Image Clustering ImageNet-50 Single-Noun Prior NMI 0.847 # 4
ACCURACY 0.827 # 3
ARI 0.744 # 3

Methods


No methods listed for this paper. Add relevant methods here