Distributed TensorFlow with MPI

7 Mar 2016 · Abhinav Vishnu, Charles Siegel, Jeffrey Daily ·

Machine Learning and Data Mining (MLDM) algorithms are becoming increasingly important in analyzing large volume of data generated by simulations, experiments and mobile devices. With increasing data volume, distributed memory systems (such as tightly connected supercomputers or cloud computing systems) are becoming important in designing in-memory and massively parallel MLDM algorithms. Yet, the majority of open source MLDM software is limited to sequential execution with a few supporting multi-core/many-core execution. In this paper, we extend recently proposed Google TensorFlow for execution on large scale clusters using Message Passing Interface (MPI). Our approach requires minimal changes to the TensorFlow runtime -- making the proposed implementation generic and readily usable to increasingly large users of TensorFlow. We evaluate our implementation using an InfiniBand cluster and several well knowndatasets. Our evaluation indicates the efficiency of our proposed implementation.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Datasets

Add Datasets introduced or used in this paper

Edit Social Preview

Distributed TensorFlow with MPI

Code Edit Add Remove Mark official

Categories

Datasets Edit

Code

Add Remove Mark official

Datasets