Search Results for author: Brian Chmiel

Found 15 papers, 9 papers with code

EXAQ: Exponent Aware Quantization For LLMs Acceleration

1 code implementation4 Oct 2024 Moran Shkolnik, Maxim Fishman, Brian Chmiel, Hilla Ben-Yaacov, Ron Banner, Kfir Yehuda Levy

The combination of accelerating both $e^x$ and $\sum(e^x)$ results in a 36. 9% acceleration in the softmax operation.

Quantization Question Answering

Scaling FP8 training to trillion-token LLMs

no code implementations19 Sep 2024 Maxim Fishman, Brian Chmiel, Ron Banner, Daniel Soudry

We train, for the first time, large language models using FP8 precision on datasets up to 2 trillion tokens -- a 20-fold increase over previous limits.

Quantization

Bimodal Distributed Binarized Neural Networks

1 code implementation5 Apr 2022 Tal Rozen, Moshe Kimhi, Brian Chmiel, Avi Mendelson, Chaim Baskin

The proposed method consists of a training scheme that we call Weight Distribution Mimicking (WDM), which efficiently imitates the full-precision network weight distribution to their binary counterpart.

Binarization Quantization

Minimum Variance Unbiased N:M Sparsity for the Neural Gradients

no code implementations21 Mar 2022 Brian Chmiel, Itay Hubara, Ron Banner, Daniel Soudry

We show that while minimization of the MSE works fine for pruning the weights and activations, it catastrophically fails for the neural gradients.

Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats

no code implementations19 Dec 2021 Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry

Reducing the computational footprint of the entire training process requires the quantization of the neural gradients, i. e., the loss gradients with respect to the outputs of intermediate neural layers.

Quantization

Logarithmic Unbiased Quantization: Practical 4-bit Training in Deep Learning

no code implementations29 Sep 2021 Brian Chmiel, Ron Banner, Elad Hoffer, Hilla Ben Yaacov, Daniel Soudry

Based on this, we suggest a logarithmic unbiased quantization (LUQ) method to quantize both the forward and backward phase to 4-bit, achieving state-of-the-art results in 4-bit training.

Deep Learning Quantization

Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

1 code implementation NeurIPS 2021 Itay Hubara, Brian Chmiel, Moshe Island, Ron Banner, Seffi Naor, Daniel Soudry

Finally, to solve the problem of switching between different structure constraints, we suggest a method to convert a pre-trained model with unstructured sparsity to an N:M fine-grained block sparsity model with little to no training.

Neural gradients are near-lognormal: improved quantized and sparse training

no code implementations ICLR 2021 Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, Daniel Soudry

While training can mostly be accelerated by reducing the time needed to propagate neural gradients back throughout the model, most previous works focus on the quantization/pruning of weights and activations.

Neural Network Compression Quantization

Colored Noise Injection for Training Adversarially Robust Neural Networks

no code implementations4 Mar 2020 Evgenii Zheltonozhskii, Chaim Baskin, Yaniv Nemcovsky, Brian Chmiel, Avi Mendelson, Alex M. Bronstein

Even though deep learning has shown unmatched performance on various tasks, neural networks have been shown to be vulnerable to small adversarial perturbations of the input that lead to significant performance degradation.

Robust Quantization: One Model to Rule Them All

1 code implementation NeurIPS 2020 Moran Shkolnik, Brian Chmiel, Ron Banner, Gil Shomron, Yury Nahshan, Alex Bronstein, Uri Weiser

Neural network quantization methods often involve simulating the quantization process during training, making the trained model highly dependent on the target bit-width and precise way quantization is performed.

Quantization

Loss Aware Post-training Quantization

2 code implementations17 Nov 2019 Yury Nahshan, Brian Chmiel, Chaim Baskin, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson

We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging.

Quantization

Smoothed Inference for Adversarially-Trained Models

2 code implementations17 Nov 2019 Yaniv Nemcovsky, Evgenii Zheltonozhskii, Chaim Baskin, Brian Chmiel, Maxim Fishman, Alex M. Bronstein, Avi Mendelson

In this work, we study the application of randomized smoothing as a way to improve performance on unperturbed data as well as to increase robustness to adversarial attacks.

Adversarial Defense

CAT: Compression-Aware Training for bandwidth reduction

1 code implementation25 Sep 2019 Chaim Baskin, Brian Chmiel, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson

Our method trains the model to achieve low-entropy feature maps, which enables efficient compression at inference time using classical transform coding methods.

Quantization

Feature Map Transform Coding for Energy-Efficient CNN Inference

1 code implementation26 May 2019 Brian Chmiel, Chaim Baskin, Ron Banner, Evgenii Zheltonozhskii, Yevgeny Yermolin, Alex Karbachevsky, Alex M. Bronstein, Avi Mendelson

We analyze the performance of our approach on a variety of CNN architectures and demonstrate that FPGA implementation of ResNet-18 with our approach results in a reduction of around 40% in the memory energy footprint, compared to quantized network, with negligible impact on accuracy.

Video Compression

Towards Learning of Filter-Level Heterogeneous Compression of Convolutional Neural Networks

2 code implementations22 Apr 2019 Yochai Zur, Chaim Baskin, Evgenii Zheltonozhskii, Brian Chmiel, Itay Evron, Alex M. Bronstein, Avi Mendelson

While mainstream deep learning methods train the neural networks weights while keeping the network architecture fixed, the emerging neural architecture search (NAS) techniques make the latter also amenable to training.

Network Pruning Neural Architecture Search +1

Cannot find the paper you are looking for? You can Submit a new open access paper.