Search Results for author: Denis Kuznedelev

Found 10 papers, 5 papers with code

Extreme Compression of Large Language Models via Additive Quantization

1 code implementation11 Jan 2024 Vage Egiazarian, Andrei Panferov, Denis Kuznedelev, Elias Frantar, Artem Babenko, Dan Alistarh

The emergence of accurate open large language models (LLMs) has led to a race towards quantization techniques for such models enabling execution on end-user devices.


Sparse Fine-tuning for Inference Acceleration of Large Language Models

2 code implementations10 Oct 2023 Eldar Kurtic, Denis Kuznedelev, Elias Frantar, Michael Goin, Dan Alistarh

While the standard approach is to leverage sparsity for computational reduction, we observe that in the case of memory-bound LLMs sparsity can also be leveraged for reducing memory bandwidth.

Quantization Text Generation +1

Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

no code implementations3 Aug 2023 Denis Kuznedelev, Eldar Kurtic, Eugenia Iofinova, Elias Frantar, Alexandra Peste, Dan Alistarh

Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the community.

Model Compression Network Pruning +1

Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression

no code implementations25 Mar 2023 Denis Kuznedelev, Soroush Tabesh, Kimia Noorbakhsh, Elias Frantar, Sara Beery, Eldar Kurtic, Dan Alistarh

To address this, we ask: can we quickly compress large generalist models into accurate and efficient specialists?

A critical look at the evaluation of GNNs under heterophily: Are we really making progress?

2 code implementations22 Feb 2023 Oleg Platonov, Denis Kuznedelev, Michael Diskin, Artem Babenko, Liudmila Prokhorenkova

Graphs without this property are called heterophilous, and it is typically assumed that specialized methods are required to achieve strong performance on such graphs.

Graph Representation Learning Node Classification

CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

no code implementations NeurIPS 2023 Denis Kuznedelev, Eldar Kurtic, Elias Frantar, Dan Alistarh

To further showcase CAP's accuracy and scalability, we use it to show for the first time that extremely-accurate large vision models, trained via self-supervised techniques, can also be pruned to moderate sparsities, with negligible accuracy loss.

Image Classification Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.