Here, our desired task (meta-objective) is the performance of the representation on semi-supervised classification, and we meta-learn an algorithm -- an unsupervised weight update rule -- that produces representations that perform well under this meta-objective.
We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements.
#3 best model for Machine Translation on WMT2014 English-German
Transformer networks have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.
SOTA for Language Modelling on Hutter Prize
Unsupervised image-to-image translation has gained considerable attention due to the recent impressive progress based on generative adversarial networks (GANs).
Our results have several implications: 1) training a large, over-parameterized model is not necessary to obtain an efficient final model, 2) learned "important" weights of the large model are not necessarily useful for the small pruned model, 3) the pruned architecture itself, rather than a set of inherited "important" weights, is what leads to the efficiency benefit in the final model, which suggests that some pruning algorithms could be seen as performing network architecture search.
For the challenging semantic image segmentation task the most efficient models have traditionally combined the structured modelling capabilities of Conditional Random Fields (CRFs) with the feature extraction power of CNNs.
Work on the problem of contextualized word representation---the development of reusable neural network components for sentence understanding---has recently seen a surge of progress centered on the unsupervised pretraining task of language modeling with methods like ELMo.