1 code implementation • 4 Oct 2024 • Moran Shkolnik, Maxim Fishman, Brian Chmiel, Hilla Ben-Yaacov, Ron Banner, Kfir Yehuda Levy
The combination of accelerating both $e^x$ and $\sum(e^x)$ results in a 36. 9% acceleration in the softmax operation.
no code implementations • 19 Sep 2024 • Maxim Fishman, Brian Chmiel, Ron Banner, Daniel Soudry
We train, for the first time, large language models using FP8 precision on datasets up to 2 trillion tokens -- a 20-fold increase over previous limits.
1 code implementation • 5 Apr 2022 • Tal Rozen, Moshe Kimhi, Brian Chmiel, Avi Mendelson, Chaim Baskin
The proposed method consists of a training scheme that we call Weight Distribution Mimicking (WDM), which efficiently imitates the full-precision network weight distribution to their binary counterpart.
no code implementations • 21 Mar 2022 • Brian Chmiel, Itay Hubara, Ron Banner, Daniel Soudry
We show that while minimization of the MSE works fine for pruning the weights and activations, it catastrophically fails for the neural gradients.
no code implementations • 19 Dec 2021 • Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry
Reducing the computational footprint of the entire training process requires the quantization of the neural gradients, i. e., the loss gradients with respect to the outputs of intermediate neural layers.
no code implementations • 29 Sep 2021 • Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry
Based on this, we suggest a logarithmic unbiased quantization (LUQ) method to quantize both the forward and backward phase to 4-bit, achieving state-of-the-art results in 4-bit training.
1 code implementation • NeurIPS 2021 • Itay Hubara, Brian Chmiel, Moshe Island, Ron Banner, Seffi Naor, Daniel Soudry
Finally, to solve the problem of switching between different structure constraints, we suggest a method to convert a pre-trained model with unstructured sparsity to an N:M fine-grained block sparsity model with little to no training.
no code implementations • ICLR 2021 • Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, Daniel Soudry
While training can mostly be accelerated by reducing the time needed to propagate neural gradients back throughout the model, most previous works focus on the quantization/pruning of weights and activations.
no code implementations • 4 Mar 2020 • Evgenii Zheltonozhskii, Chaim Baskin, Yaniv Nemcovsky, Brian Chmiel, Avi Mendelson, Alex M. Bronstein
Even though deep learning has shown unmatched performance on various tasks, neural networks have been shown to be vulnerable to small adversarial perturbations of the input that lead to significant performance degradation.
1 code implementation • NeurIPS 2020 • Moran Shkolnik, Brian Chmiel, Ron Banner, Gil Shomron, Yury Nahshan, Alex Bronstein, Uri Weiser
Neural network quantization methods often involve simulating the quantization process during training, making the trained model highly dependent on the target bit-width and precise way quantization is performed.
2 code implementations • 17 Nov 2019 • Yury Nahshan, Brian Chmiel, Chaim Baskin, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson
We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging.
2 code implementations • 17 Nov 2019 • Yaniv Nemcovsky, Evgenii Zheltonozhskii, Chaim Baskin, Brian Chmiel, Maxim Fishman, Alex M. Bronstein, Avi Mendelson
In this work, we study the application of randomized smoothing as a way to improve performance on unperturbed data as well as to increase robustness to adversarial attacks.
1 code implementation • 25 Sep 2019 • Chaim Baskin, Brian Chmiel, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson
Our method trains the model to achieve low-entropy feature maps, which enables efficient compression at inference time using classical transform coding methods.
1 code implementation • 26 May 2019 • Brian Chmiel, Chaim Baskin, Ron Banner, Evgenii Zheltonozhskii, Yevgeny Yermolin, Alex Karbachevsky, Alex M. Bronstein, Avi Mendelson
We analyze the performance of our approach on a variety of CNN architectures and demonstrate that FPGA implementation of ResNet-18 with our approach results in a reduction of around 40% in the memory energy footprint, compared to quantized network, with negligible impact on accuracy.
2 code implementations • 22 Apr 2019 • Yochai Zur, Chaim Baskin, Evgenii Zheltonozhskii, Brian Chmiel, Itay Evron, Alex M. Bronstein, Avi Mendelson
While mainstream deep learning methods train the neural networks weights while keeping the network architecture fixed, the emerging neural architecture search (NAS) techniques make the latter also amenable to training.