CUGE is a Chinese Language Understanding and Generation Evaluation benchmark with the following features: (1) Hierarchical benchmark framework, where datasets are principally selected and organized with a language capability-task-dataset hierarchy. (2) Multi-level scoring strategy, where different levels of model performance are provided based on the hierarchical framework.

CUGE covers 7 important language capabilities, 17 mainstream NLP tasks and 19 representative datasets. It includes tasks like: word segmentation, part of speech tagging, reading comprehension and document retrieval.


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


  • Unknown