xSID, a new evaluation benchmark for cross-lingual (X) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect, covering Arabic (ar), Chinese (zh), Danish (da), Dutch (nl), English (en), German (de), Indonesian (id), Italian (it), Japanese (ja), Kazakh (kk), Serbian (sr), Turkish (tr) and an Austro-Bavarian German dialect, South Tyrolean (de-st).
13 PAPERS • NO BENCHMARKS YET
We collect utterances from the Chinese Artificial Intelligence Speakers (CAIS), and annotate them with slot tags and intent labels. The training, validation and test sets are split by the distribution of intents, where detailed statistics are provided in the supplementary material. Since the utterances are collected from speaker systems in the real world, intent labels are partial to the PlayMusic option. We adopt the BIOES tagging scheme for slots instead of the BIO2 used in the ATIS, since previous studies have highlighted meaningful improvements with this scheme (Ratinov and Roth, 2009) in the sequence labeling field
6 PAPERS • 2 BENCHMARKS
RiSAWOZ is a large-scale multi-domain Chinese Wizard-of-Oz dataset with Rich Semantic Annotations. RiSAWOZ contains 11.2K human-to-human (H2H) multi-turn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains, which is larger than all previous annotated H2H conversational datasets. Both single- and multi-domain dialogues are constructed, accounting for 65% and 35%, respectively. Each dialogue is labelled with comprehensive dialogue annotations, including dialogue goal in the form of natural language description, domain, dialogue states and acts at both the user and system side. In addition to traditional dialogue annotations, it also includes linguistic annotations on discourse phenomena, e.g., ellipsis and coreference, in dialogues, which are useful for dialogue coreference and ellipsis resolution tasks.
5 PAPERS • NO BENCHMARKS YET
In the paper, to bridge the research gap, we propose a new and important task, Profile-based Spoken Language Understanding (ProSLU), which requires a model not only depends on the text but also on the given supporting profile information. We further introduce a Chinese human-annotated dataset, with over 5K utterances annotated with intent and slots, and corresponding supporting profile information. In total, we provide three types of supporting profile information: (1) Knowledge Graph (KG) consists of entities with rich attributes, (2) User Profile (UP) is composed of user settings and information, (3) Context Awareness(CA) is user state and environmental information.
2 PAPERS • 3 BENCHMARKS
Voice Navigation is a large-scale dataset of Chinese speech for slot filling, containing more than 830,000 samples.
1 PAPER • NO BENCHMARKS YET