Language Model Pre-Training

MPNet is a pre-training method for language models that combines masked language modeling (MLM) and permuted language modeling (PLM) in one view. It takes the dependency among the predicted tokens into consideration through permuted language modeling and thus avoids the issue of BERT. On the other hand, it takes position information of all tokens as input to make the model see the position information of all the tokens and thus alleviates the position discrepancy of XLNet.

The training objective of MPNet is:

$$ \mathbb{E}_{z\in{\mathcal{Z}_{n}}} \sum^{n}_{t=c+1}\log{P}\left(x_{z_{t}}\mid{x_{z_{<t}}}, M\_{z\_{{>}{c}}}; \theta\right) $$

As can be seen, MPNet conditions on ${x_{z_{<t}}}$ (the tokens preceding the current predicted token $x_{z_{t}}$) rather than only the non-predicted tokens ${x_{z_{<=c}}}$ in MLM; comparing with PLM, MPNet takes more information (i.e., the mask symbol $[M]$ in position $z_{>c}$) as inputs. Although the objective seems simple, it is challenging to implement the model efficiently. For details, see the paper.

Source: MPNet: Masked and Permuted Pre-training for Language Understanding

Papers


Paper Code Results Date Stars

Tasks


Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories