Search Results for author: Yury Nahshan

Found 8 papers, 8 papers with code

Linear Log-Normal Attention with Unbiased Concentration

1 code implementation22 Nov 2023 Yury Nahshan, Joseph Kampeas, Emir Haleva

Transformer models have achieved remarkable results in a wide range of applications.

Rotation Invariant Quantization for Model Compression

1 code implementation3 Mar 2023 Joseph Kampeas, Yury Nahshan, Hanoch Kremer, Gil Lederman, Shira Zaloshinski, Zheng Li, Emir Haleva

Post-training Neural Network (NN) model compression is an attractive approach for deploying large, memory-consuming models on devices with limited memory resources.

Model Compression Quantization

Robust Quantization: One Model to Rule Them All

1 code implementation NeurIPS 2020 Moran Shkolnik, Brian Chmiel, Ron Banner, Gil Shomron, Yury Nahshan, Alex Bronstein, Uri Weiser

Neural network quantization methods often involve simulating the quantization process during training, making the trained model highly dependent on the target bit-width and precise way quantization is performed.

Quantization

Post training 4-bit quantization of convolutional networks for rapid-deployment

1 code implementation NeurIPS 2019 Ron Banner, Yury Nahshan, Daniel Soudry

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources.

Quantization

Loss Aware Post-training Quantization

2 code implementations17 Nov 2019 Yury Nahshan, Brian Chmiel, Chaim Baskin, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson

We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging.

Quantization

ACIQ: Analytical Clipping for Integer Quantization of neural networks

1 code implementation ICLR 2019 Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry

We analyze the trade-off between quantization noise and clipping distortion in low precision networks.

Quantization

Post-training 4-bit quantization of convolution networks for rapid-deployment

2 code implementations2 Oct 2018 Ron Banner, Yury Nahshan, Elad Hoffer, Daniel Soudry

Convolutional neural networks require significant memory bandwidth and storage for intermediate computations, apart from substantial computing resources.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.