MacBERT Explained | Papers With Code

Method Name:*

Method Full Name:*

Description with Markdown (optional):

**MacBERT** is a [Transformer](https://paperswithcode.com/methods/category/transformers)-based model for Chinese NLP that alters [RoBERTa](https://paperswithcode.com/method/roberta) in several ways, including a modified masking strategy. Instead of masking with [MASK] token, which never appears in the fine-tuning stage, MacBERT masks the word with its similar word. Specifically MacBERT shares the same pre-training tasks as [BERT](https://paperswithcode.com/method/bert) with several modifications. For the MLM task, the following modifications are performed:

- Whole word masking is used as well as Ngram masking strategies for selecting candidate tokens for masking, with a percentage of
40%, 30%, 20%, 10% for word-level unigram to 4-gram.
- Instead of masking with [MASK] token, which never appears in the fine-tuning stage, similar words are used for the masking purpose. A similar word is obtained by using Synonyms toolkit which is based on word2vec similarity calculations. If an N-gram is selected to mask, we will find similar words individually. In rare cases, when there is no similar word, we will degrade to use random word replacement.
- A percentage of 15% input words is used for masking, where 80% will replace with similar words, 10% replace with a random word, and keep with original words for the rest of 10%.

Code Snippet URL (optional):

Image

Currently: methods/0d2c4483-1e92-4d62-8c21-eaaa6fca0ec9.png Clear
Change:

Attached collections:

TRANSFORMERS

AUTOENCODING TRANSFORMERS

Add:

New collection name:

Top-level area:

Parent collection (if any):

Description (optional):

Task	Papers	Share
Language Modelling	1	50.00%
Stock Market Prediction	1	50.00%

MacBERT

Papers

Tasks

Usage Over Time

Components

Categories

Add Remove