XLNet: Generalized Autoregressive Pretraining for Language Understanding

huggingface/transformers NeurIPS 2019

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.

Document Ranking Humor Detection +7

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

huggingface/transformers NeurIPS 2020

With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost.

Reading Comprehension Text Classification

Multilingual Denoising Pre-training for Neural Machine Translation

huggingface/transformers 22 Jan 2020

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks.

Denoising Document-level +1

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

huggingface/transformers NAACL 2019

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.

Common Sense Reasoning Conversational Response Selection +6

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

huggingface/transformers ICLR 2020

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks.

Common Sense Reasoning Linguistic Acceptability +4

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

huggingface/transformers 11 Oct 2020

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation.

End-To-End Speech Recognition Machine Translation +3

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

huggingface/transformers NeurIPS 2020

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks.

Question Answering Text Generation

Cross-lingual Language Model Pretraining

huggingface/transformers NeurIPS 2019

On unsupervised machine translation, we obtain 34. 3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU.

Language Modelling Natural Language Understanding +1

RoBERTa: A Robustly Optimized BERT Pretraining Approach

huggingface/transformers 26 Jul 2019

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

Common Sense Reasoning Language Modelling +6

Beyond English-Centric Multilingual Machine Translation

huggingface/transformers 21 Oct 2020

Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.

Machine Translation