We introduce a generalization to the lottery ticket hypothesis in which the notion of "sparsity" is relaxed by choosing an arbitrary basis in the space of parameters.
In particular, we uncover that there are certain implicit design choices, which may drastically affect the effectiveness of distillation.
Ranked #300 on Image Classification on ImageNet
We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models.
The low-rank tensor approximation is very promising for the compression of deep neural networks.