Specifically, we target semi-supervised classification performance, and we meta-learn an algorithm -- an unsupervised weight update rule -- that produces representations useful for this task.
Feed-forward and convolutional architectures have recently been shown to achieve superior results on some sequence modeling tasks such as machine translation, with the added advantage that they concurrently process all inputs in the sequence, leading to easy parallelization and faster training times.
#14 best model for Machine Translation on WMT2014 English-German
This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner.
#19 best model for Language Modelling on Penn Treebank (Word Level)
We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements.
#6 best model for Machine Translation on WMT2014 English-German
We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity.
#2 best model for Language Modelling on One Billion Word
We show that this property can be induced by using a relativistic discriminator which estimate the probability that the given real data is more realistic than a randomly sampled fake data.
SOTA for Image Generation on CAT 256x256
In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date.
Adaptive optimization methods such as AdaGrad, RMSProp and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates.