36 papers with code ·
Methodology

Our observations are consistent for multiple network architectures, datasets, and tasks, which imply that: 1) training a large, over-parameterized model is often not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are typically not useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is more crucial to the efficiency in the final model, which suggests that in some cases pruning can be useful as an architecture search paradigm.

The maximum probability for the size in each distribution serves as the width and depth of the pruned network, whose parameters are learned by knowledge transfer, e. g., knowledge distillation, from the original networks.

NETWORK PRUNING NEURAL ARCHITECTURE SEARCH TRANSFER LEARNING

The maximum probability for the size in each distribution serves as the width and depth of the pruned network, whose parameters are learned by knowledge transfer, e. g., knowledge distillation, from the original networks.

SOTA for Network Pruning on CIFAR-10

NETWORK PRUNING NEURAL ARCHITECTURE SEARCH TRANSFER LEARNING

Based on these results, we articulate the "lottery ticket hypothesis:" dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations.

In this paper, we address the problem of fast depth estimation on embedded systems.

Before computing the gradients for each weight update, targeted dropout stochastically selects a set of units or weights to be dropped using a simple self-reinforcing sparsity criterion and then computes the gradients for the remaining weights.

On ResNet-101, we achieve a 40% FLOPS reduction by removing 30% of the parameters, with a loss of 0. 02% in the top-1 accuracy on ImageNet.

This paper presents a method for adding multiple tasks to a single deep neural network while avoiding catastrophic forgetting.

First, we move the ReLU operation into the Winograd domain to increase the sparsity of the transformed activations.

Structured pruning is a popular method for compressing a neural network: given a large trained network, one alternates between removing channel connections and fine-tuning; reducing the overall width of the network.