Clustering Binary Data by Application of Combinatorial Optimization Heuristics

6 Jan 2020 · Javier Trejos-Zelaya, Luis Eduardo Amaya-Briceño, Alejandra Jiménez-Romero, Alex Murillo-Fernández, Eduardo Piza-Volio, Mario Villalobos-Arias ·

We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters. Five new and original methods are introduced, using neighborhoods and population behavior combinatorial optimization metaheuristics: first ones are simulated annealing, threshold accepting and tabu search, and the others are a genetic algorithm and ant colony optimization. The methods are implemented, performing the proper calibration of parameters in the case of heuristics, to ensure good results. From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM. Simulated annealing perform very well, especially compared to classical methods.

PDF Abstract