A review of systematic selection of clustering algorithms and their evaluation

24 Jun 2021  ·  Marc Wegmann, Domenique Zipperling, Jonas Hillenbrand, Jürgen Fleischer ·

Data analysis plays an indispensable role for value creation in industry. Cluster analysis in this context is able to explore given datasets with little or no prior knowledge and to identify unknown patterns. As (big) data complexity increases in the dimensions volume, variety, and velocity, this becomes even more important. Many tools for cluster analysis have been developed from early on and the variety of different clustering algorithms is huge. As the selection of the right clustering procedure is crucial to the results of the data analysis, users are in need for support on their journey of extracting knowledge from raw data. Thus, the objective of this paper lies in the identification of a systematic selection logic for clustering algorithms and corresponding validation concepts. The goal is to enable potential users to choose an algorithm that fits best to their needs and the properties of their underlying data clustering problem. Moreover, users are supported in selecting the right validation concepts to make sense of the clustering results. Based on a comprehensive literature review, this paper provides assessment criteria for clustering method evaluation and validation concept selection. The criteria are applied to several common algorithms and the selection process of an algorithm is supported by the introduction of pseudocode-based routines that consider the underlying data structure.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here