A Divide and Conquer Strategy for High Dimensional Bayesian Factor Models

9 Dec 2016  ·  Gautam Sabnis, Debdeep Pati, Barbara Engelhardt, Natesh Pillai ·

We propose a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for high-dimensional Bayesian factor models. Our approach distributes the task of high-dimensional covariance matrix estimation to multiple cores, solves each subproblem separately via a latent factor model, and then combines these estimates to produce a global estimate of the covariance matrix. Existing divide and conquer methods focus exclusively on dividing the total number of observations $n$ into subsamples while keeping the dimension $p$ fixed. Our approach is novel in this regard: it includes all of the $n$ samples in each subproblem and, instead, splits the dimension $p$ into smaller subsets for each subproblem. The subproblems themselves can be challenging to solve when $p$ is large due to the dependencies across dimensions. To circumvent this issue, we specify a novel hierarchical structure on the latent factors that allows for flexible dependencies across dimensions, while still maintaining computational efficiency. Our approach is readily parallelizable and is shown to have computational efficiency of several orders of magnitude in comparison to fitting a full factor model. We report the performance of our method in synthetic examples and a genomics application.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper