We propose a novel method for clustering data which is grounded in
information-theoretic principles and requires no parametric assumptions.
Previous attempts to use information theory to define clusters in an
assumption-free way are based on maximizing mutual information between data and
cluster labels. We demonstrate that this intuition suffers from a fundamental
conceptual flaw that causes clustering performance to deteriorate as the amount
of data increases. Instead, we return to the axiomatic foundations of
information theory to define a meaningful clustering measure based on the
notion of consistency under coarse-graining for finite data.