Simplified Chinese dataset for NER in The Third International Chinese Language Processing Bakeoff (2006), provided by Microsoft Research Asia (MSRA).
23 PAPERS • 3 BENCHMARKS
CUGE is a Chinese Language Understanding and Generation Evaluation benchmark with the following features: (1) Hierarchical benchmark framework, where datasets are principally selected and organized with a language capability-task-dataset hierarchy. (2) Multi-level scoring strategy, where different levels of model performance are provided based on the hierarchical framework.
4 PAPERS • NO BENCHMARKS YET
Large Scale Informal Chinese Corpus (LSICC) is a large-scale corpus of informal Chinese. This corpus contains around 37 million book reviews and 50 thousand netizen's comments to the news.
1 PAPER • NO BENCHMARKS YET