1 code implementation • NeurIPS 2023 • Niv Giladi, Shahar Gottlieb, Moran Shkolnik, Asaf Karnieli, Ron Banner, Elad Hoffer, Kfir Yehuda Levy, Daniel Soudry
Thus, these methods are limited by the delays caused by straggling workers.
1 code implementation • 21 Mar 2022 • Brian Chmiel, Itay Hubara, Ron Banner, Daniel Soudry
We show that while minimization of the MSE works fine for pruning the activations, it catastrophically fails for the neural gradients.
no code implementations • 6 Feb 2022 • Nurit Spingarn Eliezer, Ron Banner, Elad Hoffer, Hilla Ben-Yaakov, Tomer Michaeli
Power consumption is a major obstacle in the deployment of deep neural networks (DNNs) on end devices.
2 code implementations • 30 Jan 2022 • Maxim Fishman, Chaim Baskin, Evgenii Zheltonozhskii, Almog David, Ron Banner, Avi Mendelson
Graph neural networks (GNNs) have become a powerful tool for processing graph-structured data but still face challenges in effectively aggregating and propagating information between layers, which limits their performance.
no code implementations • 19 Dec 2021 • Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry
Based on this, we suggest a \textit{logarithmic unbiased quantization} (LUQ) method to quantize all both the forward and backward phase to 4-bit, achieving state-of-the-art results in 4-bit training without overhead.
no code implementations • 29 Sep 2021 • Nurit Spingarn, Elad Hoffer, Ron Banner, Hilla Ben Yaacov, Tomer Michaeli
Power consumption is a major obstacle in the deployment of deep neural networks (DNNs) on end devices.
no code implementations • 29 Sep 2021 • Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry
Based on this, we suggest a logarithmic unbiased quantization (LUQ) method to quantize both the forward and backward phase to 4-bit, achieving state-of-the-art results in 4-bit training.
1 code implementation • NeurIPS 2021 • Itay Hubara, Brian Chmiel, Moshe Island, Ron Banner, Seffi Naor, Daniel Soudry
Finally, to solve the problem of switching between different structure constraints, we suggest a method to convert a pre-trained model with unstructured sparsity to an N:M fine-grained block sparsity model with little to no training.
1 code implementation • ICLR 2021 • Nurit Spingarn-Eliezer, Ron Banner, Tomer Michaeli
However, all existing techniques rely on an optimization procedure to expose those directions, and offer no control over the degree of allowed interaction between different transformations.
no code implementations • ICLR 2021 • Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, Daniel Soudry
While training can mostly be accelerated by reducing the time needed to propagate neural gradients back throughout the model, most previous works focus on the quantization/pruning of weights and activations.
1 code implementation • 14 Jun 2020 • Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, Daniel Soudry
Instead, these methods only use the calibration set to set the activations' dynamic ranges.
1 code implementation • NeurIPS 2020 • Moran Shkolnik, Brian Chmiel, Ron Banner, Gil Shomron, Yury Nahshan, Alex Bronstein, Uri Weiser
Neural network quantization methods often involve simulating the quantization process during training, making the trained model highly dependent on the target bit-width and precise way quantization is performed.
1 code implementation • NeurIPS 2019 • Ron Banner, Yury Nahshan, Daniel Soudry
Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources.
2 code implementations • 17 Nov 2019 • Yury Nahshan, Brian Chmiel, Chaim Baskin, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson
We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging.
1 code implementation • 25 Sep 2019 • Chaim Baskin, Brian Chmiel, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson
Our method trains the model to achieve low-entropy feature maps, which enables efficient compression at inference time using classical transform coding methods.
1 code implementation • ECCV 2020 • Gil Shomron, Ron Banner, Moran Shkolnik, Uri Weiser
Convolutional neural networks (CNNs) introduce state-of-the-art results for various tasks with the price of high computational demands.
1 code implementation • 26 May 2019 • Brian Chmiel, Chaim Baskin, Ron Banner, Evgenii Zheltonozhskii, Yevgeny Yermolin, Alex Karbachevsky, Alex M. Bronstein, Avi Mendelson
We analyze the performance of our approach on a variety of CNN architectures and demonstrate that FPGA implementation of ResNet-18 with our approach results in a reduction of around 40% in the memory energy footprint, compared to quantized network, with negligible impact on accuracy.
1 code implementation • ICLR 2019 • Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry
We analyze the trade-off between quantization noise and clipping distortion in low precision networks.
2 code implementations • 2 Oct 2018 • Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry
Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources.
3 code implementations • NeurIPS 2018 • Ron Banner, Itay Hubara, Elad Hoffer, Daniel Soudry
Armed with this knowledge, we quantize the model parameters, activations and layer gradients to 8-bit, leaving at a higher precision only the final step in the computation of the weight gradients.
4 code implementations • NeurIPS 2018 • Elad Hoffer, Ron Banner, Itay Golan, Daniel Soudry
Over the past few years, Batch-Normalization has been commonly used in deep networks, allowing faster training and high performance for a wide variety of applications.