CLUE is a Chinese Language Understanding Evaluation benchmark. It consists of different NLU datasets. It is a community-driven project that brings together 9 tasks spanning several well-established single-sentence/sentence-pair classification tasks, as well as machine reading comprehension, all on original Chinese text.
95 PAPERS • 8 BENCHMARKS
A dataset on asking Questions for Lack of Clarity in open-domain information-seeking conversations. Qulac presents the first dataset and offline evaluation framework for studying clarifying questions in open-domain information-seeking conversational search systems.
18 PAPERS • NO BENCHMARKS YET
The Istella LETOR full dataset is composed of 33,018 queries and 220 features representing each query-document pair. It consists of 10,454,629 examples labeled with relevance judgments ranging from 0 (irrelevant) to 4 (perfectly relevant). The average number of per-query examples is 316. It has been splitted in train and test sets according to a 80%-20% scheme.
1 PAPER • NO BENCHMARKS YET