Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to (meta-)learn a weight initialization from a collection of tasks, such that a small number of weight changes results in low generalization error. We show that this form of meta-learning can be improved by letting the learning algorithm decide which weights to change, i.e., by learning where to learn. We find that patterned sparsity emerges from this process. Lower-level features tend to be frozen, while weights close to the output remain plastic. This selective sparsity enables running longer sequences of weight updates with-out overfitting, resulting in better generalization in the miniImageNet benchmark. Our findings shed light on an ongoing debate on whether meta-learning can discover adaptable features, and suggest that sparse learning can outperform simpler feature reuse schemes.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here