ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations

2 Nov 2019Shizhe DiaoJiaxin BaiYan SongTong ZhangYonggang Wang

The pre-training of text encoders normally processes text as a sequence of tokens corresponding to small text units, such as word pieces in English and characters in Chinese. It omits information carried by larger text granularity, and thus the encoders cannot easily adapt to certain combinations of characters... (read more)

PDF Abstract

Evaluation Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK COMPARE
Chinese Named Entity Recognition MSRA ZEN F1 95.25 # 1