Search Results for author: Saleh Ashkboos

Found 9 papers, 5 papers with code

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

1 code implementation26 Jan 2024 Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, James Hensman

Large language models have become the cornerstone of natural language processing, but their use comes with substantial costs in terms of compute and memory resources.

QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models

1 code implementation13 Oct 2023 Saleh Ashkboos, Ilia Markov, Elias Frantar, Tingxuan Zhong, Xincheng Wang, Jie Ren, Torsten Hoefler, Dan Alistarh

We show, for the first time, that the majority of inference computations for large generative models such as LLaMA, OPT, and Falcon can be performed with both weights and activations being cast to 4 bits, in a way that leads to practical speedups, while at the same time maintaining good accuracy.

Computational Efficiency Quantization

STen: Productive and Efficient Sparsity in PyTorch

no code implementations15 Apr 2023 Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Saleh Ashkboos, Torsten Hoefler

As deep learning models grow, sparsity is becoming an increasingly critical component of deep neural networks, enabling improved performance and reduced storage.

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

11 code implementations31 Oct 2022 Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh

In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient.

Language Modelling Model Compression +1

ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts

1 code implementation29 Jun 2022 Saleh Ashkboos, Langwen Huang, Nikoli Dryden, Tal Ben-Nun, Peter Dueben, Lukas Gianinazzi, Luca Kummer, Torsten Hoefler

We propose the ENS-10 prediction correction task for improving the forecast quality at a 48-hour lead time through ensemble post-processing.

Weather Forecasting

Motif Prediction with Graph Neural Networks

no code implementations26 May 2021 Maciej Besta, Raphael Grob, Cesare Miglioli, Nicola Bernold, Grzegorz Kwasniewski, Gabriel Gjini, Raghavendra Kanakagiri, Saleh Ashkboos, Lukas Gianinazzi, Nikoli Dryden, Torsten Hoefler

We also successfully apply our architecture for predicting more arbitrary clusters and communities, illustrating its potential for graph mining beyond motif analysis.

Graph Mining Link Prediction

New Bounds For Distributed Mean Estimation and Variance Reduction

no code implementations ICLR 2021 Peter Davies, Vijaykrishna Gurunathan, Niusha Moshrefi, Saleh Ashkboos, Dan Alistarh

We provide a method of quantization which allows distributed mean estimation to be performed with solution quality dependent only on the distance between inputs, not on input norm, and show an analogous result for distributed variance reduction.

Distributed Optimization Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.