CUGE is a Chinese Language Understanding and Generation Evaluation benchmark with the following features: (1) Hierarchical benchmark framework, where datasets are principally selected and organized with a language capability-task-dataset hierarchy. (2) Multi-level scoring strategy, where different levels of model performance are provided based on the hierarchical framework.

CUGE covers 7 important language capabilities, 17 mainstream NLP tasks and 19 representative datasets. It includes tasks like: word segmentation, part of speech tagging, reading comprehension and document retrieval.


