Specifically, we target semi-supervised classification performance, and we meta-learn an algorithm -- an unsupervised weight update rule -- that produces representations useful for this task.
We study the problem of representation learning in goal-conditioned hierarchical reinforcement learning.
Feed-forward and convolutional architectures have recently been shown to achieve superior results on some sequence modeling tasks such as machine translation, with the added advantage that they concurrently process all inputs in the sequence, leading to easy parallelization and faster training times.
#5 best model for Machine Translation on WMT2014 English-German
Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.
SOTA for Language Modelling on Hutter Prize
We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements.
We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity.
#2 best model for Language Modelling on One Billion Word
Adaptive optimization methods such as AdaGrad, RMSProp and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates.
This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner.
#15 best model for Language Modelling on Penn Treebank (Word Level)
Though designed for decaNLP, MQAN also achieves state of the art results on the WikiSQL semantic parsing task in the single-task setting.
DOMAIN ADAPTATION MACHINE TRANSLATION NAMED ENTITY RECOGNITION (NER) NATURAL LANGUAGE INFERENCE QUESTION ANSWERING RELATION EXTRACTION SEMANTIC PARSING SEMANTIC ROLE LABELING SENTIMENT ANALYSIS TEXT CLASSIFICATION TRANSFER LEARNING