1 code implementation • NeurIPS 2023 • Niv Giladi, Shahar Gottlieb, Moran Shkolnik, Asaf Karnieli, Ron Banner, Elad Hoffer, Kfir Yehuda Levy, Daniel Soudry
Thus, these methods are limited by the delays caused by straggling workers.
no code implementations • 6 Feb 2022 • Nurit Spingarn Eliezer, Ron Banner, Elad Hoffer, Hilla Ben-Yaakov, Tomer Michaeli
Power consumption is a major obstacle in the deployment of deep neural networks (DNNs) on end devices.
no code implementations • 19 Dec 2021 • Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry
Reducing the computational footprint of the entire training process requires the quantization of the neural gradients, i. e., the loss gradients with respect to the outputs of intermediate neural layers.
no code implementations • 29 Sep 2021 • Nurit Spingarn, Elad Hoffer, Ron Banner, Hilla Ben Yaacov, Tomer Michaeli
Power consumption is a major obstacle in the deployment of deep neural networks (DNNs) on end devices.
no code implementations • 29 Sep 2021 • Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry
Based on this, we suggest a logarithmic unbiased quantization (LUQ) method to quantize both the forward and backward phase to 4-bit, achieving state-of-the-art results in 4-bit training.
2 code implementations • 1 Jan 2021 • Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry
Although trained on images of a specific size, it is well established that CNNs can be used to evaluate a wide range of image sizes at test time, by adjusting the size of intermediate feature maps.
1 code implementation • 1 Oct 2020 • Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry
The optimal Bayesian solution for this requires an intractable online Bayes update to the weights posterior.
no code implementations • ICLR 2021 • Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, Daniel Soudry
While training can mostly be accelerated by reducing the time needed to propagate neural gradients back throughout the model, most previous works focus on the quantization/pruning of weights and activations.
no code implementations • CVPR 2020 • Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry
Large-batch SGD is important for scaling training of deep neural networks.
no code implementations • CVPR 2020 • Matan Haroush, Itay Hubara, Elad Hoffer, Daniel Soudry
Then, we demonstrate how these samples can be used to calibrate and fine-tune quantized models without using any real data in the process.
1 code implementation • ICLR 2020 • Niv Giladi, Mor Shpigel Nacson, Elad Hoffer, Daniel Soudry
However, asynchronous training has its pitfalls, mainly a degradation in generalization, even after convergence of the algorithm.
2 code implementations • 12 Aug 2019 • Elad Hoffer, Berry Weinstein, Itay Hubara, Tal Ben-Nun, Torsten Hoefler, Daniel Soudry
Although trained on images of aspecific size, it is well established that CNNs can be used to evaluate a wide range of image sizes at test time, by adjusting the size of intermediate feature maps.
1 code implementation • ICLR 2019 • Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry
We analyze the trade-off between quantization noise and clipping distortion in low precision networks.
1 code implementation • 27 Jan 2019 • Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry
We analyze the effect of batch augmentation on gradient variance and show that it empirically improves convergence for a wide variety of deep neural networks and datasets.
2 code implementations • 2 Oct 2018 • Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry
Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources.
3 code implementations • NeurIPS 2018 • Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry
Armed with this knowledge, we quantize the model parameters, activations and layer gradients to 8-bit, leaving at a higher precision only the final step in the computation of the weight gradients.
2 code implementations • 27 Mar 2018 • Chen Zeno, Itay Golan, Elad Hoffer, Daniel Soudry
However, research for scenarios in which task boundaries are unknown during training has been lacking.
4 code implementations • NeurIPS 2018 • Elad Hoffer, Ron Banner, Itay Golan, Daniel Soudry
Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications.
no code implementations • 14 Feb 2018 • Elad Hoffer, Shai Fine, Daniel Soudry
Deep convolutional network has been the state-of-the-art approach for a wide variety of tasks over the last few years.
5 code implementations • ICLR 2018 • Elad Hoffer, Itay Hubara, Daniel Soudry
Neural networks are commonly used as models for classification for a wide variety of tasks.
2 code implementations • ICLR 2018 • Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets.
1 code implementation • NeurIPS 2017 • Elad Hoffer, Itay Hubara, Daniel Soudry
Following this hypothesis we conducted experiments to show empirically that the "generalization gap" stems from the relatively small number of updates rather than the batch size, and can be completely eliminated by adapting the training regime used.
1 code implementation • ICLR 2018 • Daniel Soudry, Elad Hoffer
We prove that, with high probability in the limit of $N\rightarrow\infty$ datapoints, the volume of differentiable regions of the empiric loss containing sub-optimal differentiable local minima is exponentially vanishing in comparison with the same volume of global minima, given standard normal input of dimension $d_{0}=\tilde{\Omega}\left(\sqrt{N}\right)$, and a more realistic number of $d_{1}=\tilde{\Omega}\left(N/d_{0}\right)$ hidden units.
no code implementations • 21 Nov 2016 • Elad Hoffer, Itay Hubara, Nir Ailon
Convolutional networks have marked their place over the last few years as the best performing model for various visual tasks.
1 code implementation • 4 Nov 2016 • Elad Hoffer, Nir Ailon
Deep networks are successfully used as classification models yielding state-of-the-art results when trained on a large number of labeled samples.
no code implementations • 2 Oct 2016 • Elad Hoffer, Itay Hubara, Nir Ailon
Convolutional networks have marked their place over the last few years as the best performing model for various visual tasks.
3 code implementations • 20 Dec 2014 • Elad Hoffer, Nir Ailon
Deep learning has proven itself as a successful set of models for learning useful semantic representations of data.