Convolutional Neural Networks (CNNs) are computationally intensive, which
limits their application on mobile devices. Their energy is dominated by the
number of multiplies needed to perform the convolutions...
filtering algorithm (Lavin, 2015) and network pruning (Han et al., 2015) can
reduce the operation count, but these two methods cannot be directly combined
$-$ applying the Winograd transform fills in the sparsity in both the weights
and the activations. We propose two modifications to Winograd-based CNNs to
enable these methods to exploit sparsity. First, we move the ReLU operation
into the Winograd domain to increase the sparsity of the transformed
activations. Second, we prune the weights in the Winograd domain to exploit
static weight sparsity. For models on CIFAR-10, CIFAR-100 and ImageNet
datasets, our method reduces the number of multiplications by $10.4\times$,
$6.8\times$ and $10.8\times$ respectively with loss of accuracy less than
$0.1\%$, outperforming previous baselines by $2.0\times$-$3.0\times$. We also
show that moving ReLU to the Winograd domain allows more aggressive pruning.