DialogUSR dataset covers 23 domains with a multi-step crowd-sourcing procedure. It comprises 36.7 Chinese characters by assembling 3.6 single-intent queries (including initial and follow-up queries) and is designed for dialogue utterance splitting and reformulation task.
2 PAPERS • NO BENCHMARKS YET
In the paper, to bridge the research gap, we propose a new and important task, Profile-based Spoken Language Understanding (ProSLU), which requires a model not only depends on the text but also on the given supporting profile information. We further introduce a Chinese human-annotated dataset, with over 5K utterances annotated with intent and slots, and corresponding supporting profile information. In total, we provide three types of supporting profile information: (1) Knowledge Graph (KG) consists of entities with rich attributes, (2) User Profile (UP) is composed of user settings and information, (3) Context Awareness(CA) is user state and environmental information.
2 PAPERS • 3 BENCHMARKS
We collect utterances from the Chinese Artificial Intelligence Speakers (CAIS), and annotate them with slot tags and intent labels. The training, validation and test sets are split by the distribution of intents, where detailed statistics are provided in the supplementary material. Since the utterances are collected from speaker systems in the real world, intent labels are partial to the PlayMusic option. We adopt the BIOES tagging scheme for slots instead of the BIO2 used in the ATIS, since previous studies have highlighted meaningful improvements with this scheme (Ratinov and Roth, 2009) in the sequence labeling field
6 PAPERS • 2 BENCHMARKS