CSCD-IME

Introduced by Hu et al. in CSCD-IME: Correcting Spelling Errors Generated by Pinyin IME

Chinese Spelling Correction Dataset for errors generated by pinyin IME (CSCD-IME), a dataset containing 40,000 annotated sentences from real posts of official media on Sina Weibo. It is designed to detect and correct spelling mistakes in Chinese texts.

Source: CSCD-IME: Correcting Spelling Errors Generated by Pinyin IME

Homepage