Variance Pruning: Pruning Language Models via Temporal Neuron Variance

29 Sep 2021  ·  Berry Weinstein, Yonatan Belinkov ·

As language models become larger, different pruning methods have been proposed to reduce model size. However, the typical sparsity patterns that are formed by commonly pruning regimes do not fully exploit the properties of modern hardware devices on which these models are being trained and deployed. Most known unstructured, or even structured, pruning regimes usually introduce requirements for additional hardware components to make these sparsity patterns useful. Here we propose a simple pruning algorithm, based on variance analysis of output neurons that correspond to entire rows of weights. Our algorithm facilitates the construction of row-sparse matrices, allowing an extremely convenient way of exploiting this sparsity on existing hardware architectures. Empirical experiments with natural language understanding tasks show that our method leads to little to no accuracy degradation, and at times even better accuracy, using a 50\% sparse BERT\textsubscript{LARGE} model.

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods