SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

2 Jan 2023  ·  Elias Frantar, Dan Alistarh ·

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.

PDF Abstract

Results from the Paper


 Ranked #1 on Language Modelling on WikiText-2 (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Common Sense Reasoning ARC (Challenge) SparseGPT (175B, 4:8 Sparsity) Accuracy 39.85 # 32
Common Sense Reasoning ARC (Challenge) OPT-175B Accuracy 43.94 # 30
Common Sense Reasoning ARC (Challenge) SparseGPT (175B, 50% Sparsity) Accuracy 41.3 # 31
Common Sense Reasoning ARC (Challenge) OPT-175B (50% Sparsity) Accuracy 25.6 # 36
Common Sense Reasoning ARC (Challenge) SparseGPT (175B, 2:4 Sparsity) Accuracy 38.99 # 33
Common Sense Reasoning ARC (Easy) SparseGPT (175B, 4:8 Sparsity) Accuracy 68.35 # 26
Common Sense Reasoning ARC (Easy) OPT-175B Accuracy 71.04 # 20
Common Sense Reasoning ARC (Easy) OPT-175B (50% Sparsity) Accuracy 28.03 # 32
Common Sense Reasoning ARC (Easy) SparseGPT (175B, 50% Sparsity) Accuracy 69.65 # 23
Common Sense Reasoning ARC (Easy) SparseGPT (175B, 2:4 Sparsity) Accuracy 67.08 # 28
Language Modelling LAMBADA OPT-175B Accuracy 75.59 # 19
Language Modelling LAMBADA SparseGPT (175B, 2:4 Sparsity) Accuracy 79.47 # 13
Language Modelling LAMBADA SparseGPT (175B, 4:8 Sparsity) Accuracy 78.77 # 14
Language Modelling LAMBADA SparseGPT (175B, 50% Sparsity) Accuracy 76.51 # 17
Language Modelling LAMBADA OPT-175B (50% Sparsity) Accuracy 0.02 # 33
Question Answering PIQA OPT-175B Accuracy 81.07 # 12
Question Answering PIQA SparseGPT (175B, 50% Sparsity) Accuracy 80.63 # 14
Question Answering PIQA SparseGPT (175B, 4:8 Sparsity) Accuracy 79.54 # 19
Question Answering PIQA SparseGPT (175B, 2:4 Sparsity) Accuracy 79.54 # 19
Question Answering PIQA OPT-175B (50% Sparsity) Accuracy 54.73 # 35
Question Answering StoryCloze OPT-175B Accuracy 79.82 # 9
Question Answering StoryCloze SparseGPT (175B, 2:4 Sparsity) Accuracy 76.19 # 16
Question Answering StoryCloze SparseGPT (175B, 50% Sparsity) Accuracy 78.87 # 10
Question Answering StoryCloze OPT-175B (50% Sparsity) Accuracy 47.10 # 18
Question Answering StoryCloze SparseGPT (175B, 4:8 Sparsity) Accuracy 77.02 # 13
Language Modelling WikiText-2 SparseGPT (175B, 2:4 Sparsity) Test perplexity 8.73 # 4
Language Modelling WikiText-2 OPT-175B Test perplexity 8.34 # 2
Language Modelling WikiText-2 OPT-175B (50% Sparsity) Test perplexity 234.77 # 38
Language Modelling WikiText-2 SparseGPT (175B, 50% Sparsity) Test perplexity 8.21 # 1
Language Modelling WikiText-2 SparseGPT (175B, 4:8 Sparsity) Test perplexity 8.45 # 3

Methods