Masked Language Modeling

214 papers with code • 12 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Masked Language Modeling

Trend	Dataset	Best Model	Paper	Code	Compare

Show all 12 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Masked Language Modeling models and implementations

huggingface/transformers

4 papers

124,793

alibaba/EasyNLP

3 papers

1,946

Datasets

Most implemented papers

Most implemented Social Latest No code

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

google-research/electra • • ICLR 2020

Then, instead of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that predicts whether each token in the corrupted input was replaced by a generator sample or not.

Paper
Code

LXMERT: Learning Cross-Modality Encoder Representations from Transformers

airsplay/lxmert • • IJCNLP 2019

In LXMERT, we build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language encoder, and a cross-modality encoder.

Paper
Code

UNITER: UNiversal Image-TExt Representation Learning

ChenRocks/UNITER • • ECCV 2020

Different from previous work that applies joint random masking to both modalities, we use conditional masking on pre-training tasks (i. e., masked language/region modeling is conditioned on full observation of image/text).

Paper
Code

On the Cross-lingual Transferability of Monolingual Representations

deepmind/xquad • ACL 2020

This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions.

Paper
Code

REALM: Retrieval-Augmented Language Model Pre-Training

google-research/language • • 10 Feb 2020

Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering.

Paper
Code

MPNet: Masked and Permuted Pre-training for Language Understanding

microsoft/MPNet • • NeurIPS 2020

Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem.

Paper
Code

Language-agnostic BERT Sentence Embedding

FreddeFrallan/Multilingual-CLIP • • ACL 2022

While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored.

Paper
Code

RealFormer: Transformer Likes Residual Attention

google-research/google-research • • Findings (ACL) 2021

Transformer is the backbone of modern NLP models.

Paper
Code

Talking-Heads Attention

zygmuntz/hyperband • 5 Mar 2020

We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation. While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.

Paper
Code

HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training

linjieli222/HERO • • EMNLP 2020

We present HERO, a novel framework for large-scale video+language omni-representation learning.

Paper
Code

Masked Language Modeling

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result