PHMOSpell: Phonological and Morphological Knowledge Guided Chinese Spelling Check

Chinese Spelling Check (CSC) is a challenging task due to the complex characteristics of Chinese characters. Statistics reveal that most Chinese spelling errors belong to phonological or visual errors. However, previous methods rarely utilize phonological and morphological knowledge of Chinese characters or heavily rely on external resources to model their similarities. To address the above issues, we propose a novel end-to-end trainable model called PHMOSpell, which promotes the performance of CSC with multi-modal information. Specifically, we derive pinyin and glyph representations for Chinese characters from audio and visual modalities respectively, which are integrated into a pre-trained language model by a well-designed adaptive gating mechanism. To verify its effectiveness, we conduct comprehensive experiments and ablation tests. Experimental results on three shared benchmarks demonstrate that our model consistently outperforms previous state-of-the-art models.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Chinese Spell Checking SIGHAN 2015 PHMOSpell Detection F1 80.5 # 1
Correction F1 78.1 # 1

Methods


No methods listed for this paper. Add relevant methods here