3 dataset results for Chinese Word Segmentation AND Texts

Simplified Chinese dataset for NER in The Third International Chinese Language Processing Bakeoff (2006), provided by Microsoft Research Asia (MSRA).

23 PAPERS • 3 BENCHMARKS

CUGE

CUGE is a Chinese Language Understanding and Generation Evaluation benchmark with the following features: (1) Hierarchical benchmark framework, where datasets are principally selected and organized with a language capability-task-dataset hierarchy. (2) Multi-level scoring strategy, where different levels of model performance are provided based on the hierarchical framework.

4 PAPERS • NO BENCHMARKS YET

LSICC

LSICC (Large Scale Informal Chinese Corpus)

Large Scale Informal Chinese Corpus (LSICC) is a large-scale corpus of informal Chinese. This corpus contains around 37 million book reviews and 50 thousand netizen's comments to the news.

1 PAPER • NO BENCHMARKS YET

Datasets

3 dataset results for Chinese Word Segmentation AND Texts