GPU Kernels for Block-Sparse Weights
We’re releasing highly optimized GPU kernels for an underexplored class of neural network architectures: networks with block-sparse weights. The kernels allow for efficient evaluation and differentiation of linear layers, including convolutional layers, with flexibly configurable block-sparsity patterns in the weight matrix. We find that depending on the sparsity, these kernels can run orders of magnitude faster than the best available alternatives such as cuBLAS. Using the kernels we improve upon the state-of-the-art in text sentiment analysis and generative modeling of text and images. By releasing our kernels in the open we aim to spur further advancement in model and algorithm design.
PDF AbstractCode
Tasks
Datasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
Sentiment Analysis | CR | Block-sparse LSTM | Accuracy | 92.2 | # 4 | ||
Sentiment Analysis | IMDb | Block-sparse LSTM | Accuracy | 94.99 | # 17 | ||
Sentiment Analysis | SST-2 Binary classification | Block-sparse LSTM | Accuracy | 93.2 | # 40 | ||
Sentiment Analysis | Yelp Binary classification | Block-sparse LSTM | Error | 3.27 | # 11 |