4 dataset results for Spelling Correction

Chinese Spelling Correction Dataset for errors generated by pinyin IME (CSCD-IME), a dataset containing 40,000 annotated sentences from real posts of official media on Sina Weibo. It is designed to detect and correct spelling mistakes in Chinese texts.

2 PAPERS • NO BENCHMARKS YET

MCSCSet

MCSCSet is a large-scale specialist-annotated dataset, designed for the task of Medical-domain Chinese Spelling Correction that contains about 200k samples. MCSCSet involves: i) extensive real-world medical queries collected from Tencent Yidian, ii) corresponding misspelled sentences manually annotated by medical specialists.

2 PAPERS • NO BENCHMARKS YET

Viwiki-Spelling

Viwiki-Spelling (Vietnamese Spelling Correction Dataset)

We introduce a first Vietnamese Spelling Correction dataset containing manual labelling mistakes and corresponding correct words.

1 PAPER • NO BENCHMARKS YET

GitHub Typo Corpus

Are you the kind of person who makes a lot of typos when writing code? Or are you the one who fixes them by making "fix typo" commits? Either way, thank you—you contributed to the state-of-the-art in the NLP field.

3 PAPERS • NO BENCHMARKS YET

Datasets

4 dataset results for Spelling Correction