CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers

8 Jan 2019Alexandros KoliousisPijika WatcharapichatMatthias WeidlichLuo MaiPaolo CostaPeter Pietzuch

Deep learning models are trained on servers with many GPUs, and training must scale with the number of GPUs. Systems such as TensorFlow and Caffe2 train models with parallel synchronous stochastic gradient descent: they process a batch of training data at a time, partitioned across GPUs, and average the resulting partial gradients to obtain an updated global model... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.