no code implementations • 15 Mar 2018 • Jeff Daily, Abhinav Vishnu, Charles Siegel, Thomas Warfel, Vinay Amatya
In this paper, we present GossipGraD - a gossip communication protocol based Stochastic Gradient Descent (SGD) algorithm for scaling Deep Learning (DL) algorithms on large-scale systems.
no code implementations • 3 Oct 2016 • Charles Siegel, Jeff Daily, Abhinav Vishnu
We present novel techniques to accelerate the convergence of Deep Learning algorithms by conducting low overhead removal of redundant neurons -- apoptosis of neurons -- which do not contribute to model learning, during the training phase itself.