Explainable k-Means and k-Medians Clustering

ICML 2020 · Michal Moshkovitz, Sanjoy Dasgupta, Cyrus Rashtchian, Nave Frost ·

Clustering is a popular unsupervised learning method for geometric data. Unfortunately, many clustering algorithms use global properties of the data, and there are no simple explanations for cluster assignments. To improve interpretability, we consider using a small threshold tree to partition a dataset into clusters. This leads to cluster assignments that can be explained by very few feature values in a straightforward manner. We study this problem from a theoretical viewpoint, measuring the output quality by the k-means and k-medians objectives. In terms of negative results, we show that popular top-down decision tree algorithms may lead to clusterings with arbitrarily large cost, and we prove that any explainable clustering must incur an \Omega(\log k) approximation compared to the optimal clustering. On the upper bound side, we design efficient algorithms that produce explainable clusters using a tree with k leaves. For two means/medians, we show that a single threshold cut suffices to achieve a constant factor approximation, which is a surprising result that nearly matches our lower bounds. For general k \geq 2, our algorithm is an O(k) approximation to the optimal k-medians and an O(k^2) approximation to the optimal k-means. Prior to our work, no algorithms were known with provable guarantees independent of the dimensionality and input size.

PDF ICML 2020 PDF

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Clustering

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Explainable k-Means and k-Medians Clustering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove