Masked Language Modeling

214 papers with code • 12 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Masked Language Modeling models and implementations
3 papers
1,946

Most implemented papers

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

google-research/electra ICLR 2020

Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not.

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

airsplay/lxmert IJCNLP 2019

In LXMERT, we build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language encoder, and a cross-modality encoder.

UNITER: UNiversal Image-TExt Representation Learning

ChenRocks/UNITER ECCV 2020

Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i. e., masked language/region modeling is conditioned on full observation of image/text).

On the Cross-lingual Transferability of Monolingual Representations

deepmind/xquad ACL 2020

This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions.

REALM: Retrieval-Augmented Language Model Pre-Training

google-research/language 10 Feb 2020

Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering.

MPNet: Masked and Permuted Pre-training for Language Understanding

microsoft/MPNet NeurIPS 2020

Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem.

Language-agnostic BERT Sentence Embedding

FreddeFrallan/Multilingual-CLIP ACL 2022

While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.

Talking-Heads Attention

zygmuntz/hyperband 5 Mar 2020

We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation. While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

linjieli222/HERO EMNLP 2020

We present HERO, a novel framework for large-scale video+language omni-representation learning.