The ATIS (Airline Travel Information Systems) is a dataset consisting of audio recordings and corresponding manual transcripts about humans asking for flight information on automated airline travel inquiry systems. The data consists of 17 unique intent categories. The original split contains 4478, 500 and 893 intent-labeled reference utterances in train, development and test set respectively.
263 PAPERS • 7 BENCHMARKS
The SNIPS Natural Language Understanding benchmark is a dataset of over 16,000 crowdsourced queries distributed among 7 user intents of various complexity:
244 PAPERS • 6 BENCHMARKS
The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant. These conversations involve interactions with services and APIs spanning 20 domains, ranging from banks and events to media, calendar, travel, and weather. For most of these domains, the dataset contains multiple different APIs, many of which have overlapping functionalities but different interfaces, which reflects common real-world scenarios. The wide range of available annotations can be used for intent prediction, slot filling, dialogue state tracking, policy imitation learning, language generation, user simulation learning, among other tasks in large-scale virtual assistants. Besides these, the dataset has unseen domains and services in the evaluation set to quantify the performance in zero-shot or few shot settings.
169 PAPERS • 3 BENCHMARKS
A new challenging dataset in English spanning 18 domains, which is substantially bigger and linguistically more diverse than existing datasets.
80 PAPERS • 2 BENCHMARKS
Dataset is constructed from single intent dataset ATIS.
24 PAPERS • 3 BENCHMARKS
xSID, a new evaluation benchmark for cross-lingual (X) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect, covering Arabic (ar), Chinese (zh), Danish (da), Dutch (nl), English (en), German (de), Indonesian (id), Italian (it), Japanese (ja), Kazakh (kk), Serbian (sr), Turkish (tr) and an Austro-Bavarian German dialect, South Tyrolean (de-st).
13 PAPERS • NO BENCHMARKS YET
The MEDIA French corpus is dedicated to semantic extraction from speech in a context of human/machine dialogues. The corpus has manual transcription and conceptual annotation of dialogues from 250 speakers. It is split into the following three parts : (1) the training set (720 dialogues, 12K sentences), (2) the development set (79 dialogues, 1.3K sentences, and (3) the test set (200 dialogues, 3K sentences).
6 PAPERS • NO BENCHMARKS YET
MultiSubs is a dataset of multilingual subtitles gathered from the OPUS OpenSubtitles dataset, which in turn was sourced from opensubtitles.org. We have supplemented some text fragments (visually salient nouns in this release) within the subtitles with web images, where the word sense of the fragment has been disambiguated using a cross-lingual approach. We have introduced a fill-in-the-blank task and a lexical translation task to demonstrate the utility of the dataset. Please refer to our paper for a more detailed description of the dataset and tasks. Multisubs will benefit research on visual grounding of words especially in the context of free-form sentence.
4 PAPERS • 5 BENCHMARKS
Based on RADDLE and SNIPS , we construct Noise-SF, which includes two different perturbation settings. For single perturbations setting, we include five types of noisy utterances (character-level: \textbf{Typos}, word-level: \textbf{Speech}, and sentence-level: \textbf{Simplification}, \textbf{Verbose}, and \textbf{Paraphrase}) from RADDLE. For mixed perturbations setting, we utilize TextFlint to introduce character-level perturbation (\textbf{EntTypos}), word-level perturbation (\textbf{Subword}), and sentence-level perturbation (\textbf{AppendIrr}) and combine them to get a mixed perturbations dataset.
0 PAPER • NO BENCHMARKS YET