Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.
SOTA for Language Modelling on Hutter Prize
Machine translation is highly sensitive to the size and quality of the training data, which has led to an increasing interest in collecting and filtering large parallel corpora.
#2 best model for Cross-Lingual Bitext Mining on BUCC German-to-English
In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks.
SOTA for Linguistic Acceptability on CoLA
The Transformer is a sequence model that forgoes traditional recurrent architectures in favor of a fully attention-based approach.
Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks.
SOTA for Relation Extraction on FewRel
Natural language understanding has recently seen a surge of progress with the use of sentence encoders like ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2019) which are pretrained on variants of language modeling.
We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions.