For the correction subtask, we utilize the masked language model, the seq2seq model and the spelling check model to generate corrections based on the detection results.
Then, we obtain parse trees of the source incorrect sentences by projecting trees of the target correct sentences.
We have accumulated 1, 119 error templates for Chinese GEC based on this method.
This paper presents MuCGEC, a multi-reference multi-source evaluation dataset for Chinese Grammatical Error Correction (CGEC), consisting of 7, 063 sentences collected from three Chinese-as-a-Second-Language (CSL) learner sources.
Previous works on key information extraction from visually rich documents (VRDs) mainly focus on labeling the text within each bounding box (i. e., semantic entity), while the relations in-between are largely unexplored.
Ranked #2 on Entity Linking on FUNSD
Chinese spelling check is a challenging task due to the characteristics of the Chinese language, such as the large character set, no word boundary, and short word length.
Previous work on cross-lingual sequence labeling tasks either requires parallel data or bridges the two languages through word-byword matching.
Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering.
Ranked #1 on Natural Language Inference on QNLI
In the correction stage, candidates were generated by the three GEC models and then merged to output the final corrections for M and S types.
For Chinese word segmentation, the large-scale annotated corpora mainly focus on newswire and only a handful of annotated data is available in other domains such as patents and literature.
Detection and correction of Chinese grammatical errors have been two of major challenges for Chinese automatic grammatical error diagnosis. This paper presents an N-gram model for automatic detection and correction of Chinese grammatical errors in NLPTEA 2017 task.