In this paper we present a method for the unsupervised clustering of
high-dimensional binary data, with a special focus on electronic healthcare
records. We present a robust and efficient heuristic to face this problem using
We present the reasons why this approach is preferable
for tasks such as clustering patient records, to more commonly used
distance-based methods. We run the algorithm on two datasets of healthcare
records, obtaining clinically meaningful results.