A Communication Efficient Federated Kernel $k$-Means
A federated kernel $k$-means algorithm is developed in this paper. This algorithm resolves two challenging issues: 1) how to distributedly solve the kernel $k$-means problem under federated settings; 2) how to maintain communication efficiency in the algorithm. To tackle the first challenge, a distributed stochastic proximal gradient descent (DSPGD) algorithm is developed to determine an approximated solution to the kernel $k$-means problem. To tackle the second challenge, a communication efficient mechanism (CEM) is designed to reduce the communication cost. Besides, the federated kernel $k$-means provides two levels of privacy preservation. Theoretical analysis shows: 1) DSPGD with CEM converges with an $O(1/T)$ rate, where $T$ is the number of iterations; 2) the communication cost of DSPGD with CEM is unrelated to the number of data samples; 3) the clustering loss of the federated kernel $k$-means can approach that of the centralized kernel $k$-means. The experimental results show that the federated kernel $k$-means achieves the highest clustering quality with the communication cost reduced by more than $60\%$.
PDF Abstract