1 code implementation • 22 Nov 2023 • Yury Nahshan, Joseph Kampeas, Emir Haleva
Transformer models have achieved remarkable results in a wide range of applications.
1 code implementation • 3 Mar 2023 • Joseph Kampeas, Yury Nahshan, Hanoch Kremer, Gil Lederman, Shira Zaloshinski, Zheng Li, Emir Haleva
Post-training Neural Network (NN) model compression is an attractive approach for deploying large, memory-consuming models on devices with limited memory resources.
1 code implementation • 14 Jun 2020 • Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, Daniel Soudry
Instead, these methods only use the calibration set to set the activations' dynamic ranges.
1 code implementation • NeurIPS 2020 • Moran Shkolnik, Brian Chmiel, Ron Banner, Gil Shomron, Yury Nahshan, Alex Bronstein, Uri Weiser
Neural network quantization methods often involve simulating the quantization process during training, making the trained model highly dependent on the target bit-width and precise way quantization is performed.
1 code implementation • NeurIPS 2019 • Ron Banner, Yury Nahshan, Daniel Soudry
Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources.
2 code implementations • 17 Nov 2019 • Yury Nahshan, Brian Chmiel, Chaim Baskin, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson
We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging.
1 code implementation • ICLR 2019 • Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry
We analyze the trade-off between quantization noise and clipping distortion in low precision networks.
2 code implementations • 2 Oct 2018 • Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry
Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources.