Transformers

MacBERT is a Transformer-based model for Chinese NLP that alters RoBERTa in several ways, including a modified masking strategy. Instead of masking with [MASK] token, which never appears in the fine-tuning stage, MacBERT masks the word with its similar word. Specifically MacBERT shares the same pre-training tasks as BERT with several modifications. For the MLM task, the following modifications are performed:

  • Whole word masking is used as well as Ngram masking strategies for selecting candidate tokens for masking, with a percentage of 40%, 30%, 20%, 10% for word-level unigram to 4-gram.
  • Instead of masking with [MASK] token, which never appears in the fine-tuning stage, similar words are used for the masking purpose. A similar word is obtained by using Synonyms toolkit which is based on word2vec similarity calculations. If an N-gram is selected to mask, we will find similar words individually. In rare cases, when there is no similar word, we will degrade to use random word replacement.
  • A percentage of 15% input words is used for masking, where 80% will replace with similar words, 10% replace with a random word, and keep with original words for the rest of 10%.
Source: Revisiting Pre-Trained Models for Chinese Natural Language Processing

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 1 50.00%
Stock Market Prediction 1 50.00%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories