This dataset is for evaluating the performance of intent classification systems in the presence of "out-of-scope" queries, i.e., queries that do not fall into any of the system-supported intent classes. The dataset includes both in-scope and out-of-scope data.
71 PAPERS • 5 BENCHMARKS
MASSIVE is a parallel dataset of > 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions.
47 PAPERS • 10 BENCHMARKS
xSID, a new evaluation benchmark for cross-lingual (X) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect, covering Arabic (ar), Chinese (zh), Danish (da), Dutch (nl), English (en), German (de), Indonesian (id), Italian (it), Japanese (ja), Kazakh (kk), Serbian (sr), Turkish (tr) and an Austro-Bavarian German dialect, South Tyrolean (de-st).
13 PAPERS • NO BENCHMARKS YET
KUAKE Query Intent Classification, a dataset for intent classification, is used for the KUAKE-QIC task. Given the queries of search engines, the task requires to classify each of them into one of 11 medical intent categories defined in KUAKE-QIC, including diagnosis, etiology analysis, treatment plan, medical advice, test result analysis, disease description, consequence prediction, precautions, intended effects, treatment fees, and others.
9 PAPERS • 1 BENCHMARK
ViMQ is a Vietnamese dataset of medical questions from patients with sentence-level and entity-level annotations for the Intent Classification and Named Entity Recognition tasks. It contains Vietnamese medical questions crawled from the consultation section online between patients and doctors from www.vinmec.com, a website of a Vietnamese general hospital. Each consultation consists of a question regarding a specific health issue of a patient and a detailed respond provided by a clinical expert. The dataset contains health issues that fall into a wide range of categories including common illness, cardiology, hematology, cancer, pediatrics, etc. We removed sections where users ask about information of the hospital and selected 9,000 questions for the dataset.
3 PAPERS • NO BENCHMARKS YET
arXivEdits an annotated corpus of 751 full papers from arXiv with gold sentence alignment across their multiple versions of revision, as well as fine-grained span-level edits and their underlying intentions for 1,000 sentence pairs. This dataset is designed for studying the human revision process in the scientific writing domain.
2 PAPERS • NO BENCHMARKS YET
A labelled version of the ORCAS click-based dataset of Web queries, which provides 18 million connections to 10 million distinct queries.
1 PAPER • 1 BENCHMARK
Search4Code is a large-scale web query based dataset of code search queries for C# and Java. The Search4Code data is mined from Microsoft Bing's anonymized search query logs using weak supervision technique.
1 PAPER • NO BENCHMARKS YET
This dataset for Intent classification from human speech covers 14 coarse-grained intents from the Banking domain. This work is inspired by a similar release in the Minds-14 dataset - here, we restrict ourselves to Indian English but with a much larger training set. The data was generated by 11 (Indian English) speakers and recorded over a telephony line. We also provide access to anonymized speaker information - like gender, languages spoken, and native language - to allow more structured discussions around robustness and bias in the models you train.