XLNet: Generalized Autoregressive Pretraining for Language Understanding

huggingface/transformers NeurIPS 2019

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

huggingface/transformers NeurIPS 2020

With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost.

Multilingual Denoising Pre-training for Neural Machine Translation

huggingface/transformers 22 Jan 2020

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

huggingface/transformers NAACL 2019

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

huggingface/transformers ICLR 2020

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks.

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

huggingface/transformers 11 Oct 2020

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

huggingface/transformers NeurIPS 2020

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks.

Cross-lingual Language Model Pretraining

huggingface/transformers NeurIPS 2019

On unsupervised machine translation, we obtain 34. 3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

huggingface/transformers 26 Jul 2019

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

Beyond English-Centric Multilingual Machine Translation

huggingface/transformers 21 Oct 2020

Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.

