Distributed Matrix Factorization using Asynchrounous Communication

29 May 2017  ·  Tom Vander Aa, Imen Chakroun, Tom Haber ·

Using the matrix factorization technique in machine learning is very common mainly in areas like recommender systems. Despite its high prediction accuracy and its ability to avoid over-fitting of the data, the Bayesian Probabilistic Matrix Factorization algorithm (BPMF) has not been widely used on large scale data because of the prohibitive cost. In this paper, we propose a distributed high-performance parallel implementation of the BPMF using Gibbs sampling on shared and distributed architectures. We show by using efficient load balancing using work stealing on a single node, and by using asynchronous communication in the distributed version we beat state of the art implementations.

PDF Abstract

Categories


Distributed, Parallel, and Cluster Computing

Datasets