Search Results for author: Tijmen Blankevoort

Found 33 papers, 13 papers with code

SpinQuant: LLM quantization with learned rotations

no code implementations26 May 2024 Zechun Liu, Changsheng Zhao, Igor Fedorov, Bilge Soran, Dhruv Choudhary, Raghuraman Krishnamoorthi, Vikas Chandra, Yuandong Tian, Tijmen Blankevoort

In this work, we identify a collection of applicable rotation parameterizations that lead to identical outputs in full-precision Transformer architectures, and find that some random rotations lead to much better quantization than others, with an up to 13 points difference in downstream zero-shot reasoning performance.


Bitune: Bidirectional Instruction-Tuning

no code implementations23 May 2024 Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano

We introduce Bitune, a method that improves instruction-tuning of pretrained decoder-only large language models, leading to consistent gains on downstream tasks.


InterroGate: Learning to Share, Specialize, and Prune Representations for Multi-task Learning

no code implementations26 Feb 2024 Babak Ehteshami Bejnordi, Gaurav Kumar, Amelie Royer, Christos Louizos, Tijmen Blankevoort, Mohsen Ghafoorian

In this work, we propose \textit{InterroGate}, a novel multi-task learning (MTL) architecture designed to mitigate task interference while optimizing inference computational efficiency.

Computational Efficiency Multi-Task Learning

Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding

no code implementations26 Feb 2024 Benjamin Bergner, Andrii Skliar, Amelie Royer, Tijmen Blankevoort, Yuki Asano, Babak Ehteshami Bejnordi

We investigate the combination of encoder-decoder LLMs with both encoder-decoder and decoder-only SLMs from different model families and only require fine-tuning of the SLM.

Decoder Instruction Following +2

GPTVQ: The Blessing of Dimensionality for LLM Quantization

no code implementations23 Feb 2024 Mart van Baalen, Andrey Kuzmin, Markus Nagel, Peter Couperus, Cedric Bastoul, Eric Mahurin, Tijmen Blankevoort, Paul Whatmough

In this work we show that the size versus accuracy trade-off of neural network quantization can be significantly improved by increasing the quantization dimensionality.


The LLM Surgeon

1 code implementation28 Dec 2023 Tycho F. A. van der Ouderaa, Markus Nagel, Mart van Baalen, Yuki M. Asano, Tijmen Blankevoort

Experimentally, our method can prune rows and columns from a range of OPT models and Llamav2-7B by 20%-30%, with a negligible loss in performance, and achieve state-of-the-art results in unstructured and semi-structured pruning of large language models.

VeRA: Vector-based Random Matrix Adaptation

no code implementations17 Oct 2023 Dawid J. Kopiczko, Tijmen Blankevoort, Yuki M. Asano

Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models.

Image Classification Instruction Following

Efficient Neural PDE-Solvers using Quantization Aware Training

no code implementations14 Aug 2023 Winfried van den Dool, Tijmen Blankevoort, Max Welling, Yuki M. Asano

In the past years, the application of neural networks as an alternative to classical numerical methods to solve Partial Differential Equations has emerged as a potential paradigm shift in this century-old mathematical field.


QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

no code implementations10 Jul 2023 Jorn Peters, Marios Fournarakis, Markus Nagel, Mart van Baalen, Tijmen Blankevoort

By combining fast-to-compute sensitivities with efficient solvers during QAT, QBitOpt can produce mixed-precision networks with high task performance guaranteed to satisfy strict resource constraints.


MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers

1 code implementation5 Jul 2023 Jakob Drachmann Havtorn, Amelie Royer, Tijmen Blankevoort, Babak Ehteshami Bejnordi

The input tokens to Vision Transformers carry little semantic meaning as they are defined as regular equal-sized patches of the input image, regardless of its content.

Revisiting Single-gated Mixtures of Experts

no code implementations11 Apr 2023 Amelie Royer, Ilia Karmanov, Andrii Skliar, Babak Ehteshami Bejnordi, Tijmen Blankevoort

Mixture of Experts (MoE) are rising in popularity as a means to train extremely large-scale models, yet allowing for a reasonable computational cost at inference time.

FP8 versus INT8 for efficient deep learning inference

no code implementations31 Mar 2023 Mart van Baalen, Andrey Kuzmin, Suparna S Nair, Yuwei Ren, Eric Mahurin, Chirag Patel, Sundar Subramanian, Sanghyuk Lee, Markus Nagel, Joseph Soriaga, Tijmen Blankevoort

We theoretically show the difference between the INT and FP formats for neural networks and present a plethora of post-training quantization and quantization-aware-training results to show how this theory translates to practice.


A Practical Mixed Precision Algorithm for Post-Training Quantization

no code implementations10 Feb 2023 Nilesh Prasad Pandey, Markus Nagel, Mart van Baalen, Yin Huang, Chirag Patel, Tijmen Blankevoort

We experimentally validate our proposed method on several computer vision tasks, natural language processing tasks and many different networks, and show that we can find mixed precision networks that provide a better trade-off between accuracy and efficiency than their homogeneous bit-width equivalents.


FP8 Quantization: The Power of the Exponent

1 code implementation19 Aug 2022 Andrey Kuzmin, Mart van Baalen, Yuwei Ren, Markus Nagel, Jorn Peters, Tijmen Blankevoort

We detail the choices that can be made for the FP8 format, including the important choice of the number of bits for the mantissa and exponent, and show analytically in which settings these choices give better performance.


Simple and Efficient Architectures for Semantic Segmentation

1 code implementation16 Jun 2022 Dushyant Mehta, Andrii Skliar, Haitam Ben Yahia, Shubhankar Borse, Fatih Porikli, Amirhossein Habibian, Tijmen Blankevoort

Though the state-of-the architectures for semantic segmentation, such as HRNet, demonstrate impressive accuracy, the complexity arising from their salient design choices hinders a range of model acceleration tools, and further they make use of operations that are inefficient on current hardware.

Decoder Image Classification +2

Overcoming Oscillations in Quantization-Aware Training

1 code implementation21 Mar 2022 Markus Nagel, Marios Fournarakis, Yelysei Bondarenko, Tijmen Blankevoort

These effects are particularly pronounced in low-bit ($\leq$ 4-bits) quantization of efficient networks with depth-wise separable layers, such as MobileNets and EfficientNets.


Cyclical Pruning for Sparse Neural Networks

no code implementations2 Feb 2022 Suraj Srinivas, Andrey Kuzmin, Markus Nagel, Mart van Baalen, Andrii Skliar, Tijmen Blankevoort

Current methods for pruning neural network weights iteratively apply magnitude-based pruning on the model weights and re-train the resulting model to recover lost accuracy.

Understanding and Overcoming the Challenges of Efficient Transformer Quantization

1 code implementation EMNLP 2021 Yelysei Bondarenko, Markus Nagel, Tijmen Blankevoort

Finally, we show that transformer weights and embeddings can be quantized to ultra-low bit-widths, leading to significant memory savings with a minimum accuracy loss.


A White Paper on Neural Network Quantization

no code implementations15 Jun 2021 Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yelysei Bondarenko, Mart van Baalen, Tijmen Blankevoort

Neural network quantization is one of the most effective ways of achieving these savings but the additional noise it induces can lead to accuracy degradation.


Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces

no code implementations ICCV 2021 Bert Moons, Parham Noorzad, Andrii Skliar, Giovanni Mariani, Dushyant Mehta, Chris Lott, Tijmen Blankevoort

Second, a rapid evolutionary search finds a set of pareto-optimal architectures for any scenario using the accuracy predictor and on-device measurements.

Knowledge Distillation Model Compression +1

Bayesian Bits: Unifying Quantization and Pruning

1 code implementation NeurIPS 2020 Mart van Baalen, Christos Louizos, Markus Nagel, Rana Ali Amjad, Ying Wang, Tijmen Blankevoort, Max Welling

We introduce Bayesian Bits, a practical method for joint mixed precision quantization and pruning through gradient based optimization.


Up or Down? Adaptive Rounding for Post-Training Quantization

no code implementations ICML 2020 Markus Nagel, Rana Ali Amjad, Mart van Baalen, Christos Louizos, Tijmen Blankevoort

In this paper, we propose AdaRound, a better weight-rounding mechanism for post-training quantization that adapts to the data and the task loss.


LSQ+: Improving low-bit quantization through learnable offsets and better initialization

4 code implementations20 Apr 2020 Yash Bhalgat, Jinwon Lee, Markus Nagel, Tijmen Blankevoort, Nojun Kwak

To solve this problem, we propose LSQ+, a natural extension of LSQ, wherein we introduce a general asymmetric quantization scheme with trainable scale and offset parameters that can learn to accommodate the negative activations.

Image Classification Quantization

Conditional Channel Gated Networks for Task-Aware Continual Learning

1 code implementation CVPR 2020 Davide Abati, Jakub Tomczak, Tijmen Blankevoort, Simone Calderara, Rita Cucchiara, Babak Ehteshami Bejnordi

Therefore, we additionally introduce a task classifier that predicts the task label of each example, to deal with settings in which a task oracle is not available.

Continual Learning

Learned Threshold Pruning

no code implementations28 Feb 2020 Kambiz Azarian, Yash Bhalgat, Jinwon Lee, Tijmen Blankevoort

This is in contrast to other methods that search for per-layer thresholds via a computationally intensive iterative pruning and fine-tuning process.

Gradient $\ell_1$ Regularization for Quantization Robustness

no code implementations ICLR 2020 Milad Alizadeh, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, Max Welling

We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization.


Taxonomy and Evaluation of Structured Compression of Convolutional Neural Networks

no code implementations20 Dec 2019 Andrey Kuzmin, Markus Nagel, Saurabh Pitre, Sandeep Pendyam, Tijmen Blankevoort, Max Welling

The success of deep neural networks in many real-world applications is leading to new challenges in building more efficient architectures.

Neural Network Compression

Data-Free Quantization Through Weight Equalization and Bias Correction

5 code implementations ICCV 2019 Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling

This improves quantization accuracy performance, and can be applied to many common computer vision architectures with a straight forward API call.

Data Free Quantization object-detection +2

Relaxed Quantization for Discretized Neural Networks

1 code implementation ICLR 2019 Christos Louizos, Matthias Reisser, Tijmen Blankevoort, Efstratios Gavves, Max Welling

Neural network quantization has become an important research area due to its great impact on deployment of large models on resource constrained devices.

General Classification Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.