XGLUE is an evaluation benchmark XGLUE,which is composed of 11 tasks that span 19 languages. For each task, the training data is only available in English. This means that to succeed at XGLUE, a model must have a strong zero-shot cross-lingual transfer capability to learn from the English data of a specific task and transfer what it learned to other languages. Comparing to its concurrent work XTREME, XGLUE has two characteristics: First, it includes cross-lingual NLU and cross-lingual NLG tasks at the same time; Second, besides including 5 existing cross-lingual tasks (i.e. NER, POS, MLQA, PAWS-X and XNLI), XGLUE selects 6 new tasks from Bing scenarios as well, including News Classification (NC), Query-Ad Matching (QADSM), Web Page Ranking (WPR), QA Matching (QAM), Question Generation (QG) and News Title Generation (NTG). Such diversities of languages, tasks and task origin provide a comprehensive benchmark for quantifying the quality of a pre-trained model on cross-lingual natural lan
20 PAPERS • 2 BENCHMARKS
JGLUE, Japanese General Language Understanding Evaluation, is built to measure the general NLU ability in Japanese.
7 PAPERS • NO BENCHMARKS YET
XWINO is a multilingual collection of Winograd Schemas in six languages that can be used for evaluation of cross-lingual commonsense reasoning capabilities.
3 PAPERS • 1 BENCHMARK
The Japanese Adversarial NLI (JaNLI) dataset is designed to require understanding of Japanese linguistic phenomena and illuminate the vulnerabilities of models. Please see the paper Assessing the Generalization Capacity of Pre-trained Language Models through Japanese Adversarial Natural Language Inference for details.
2 PAPERS • NO BENCHMARKS YET