🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

20 dataset results for Intent Detection

ATIS (Airline Travel Information Systems)

The ATIS (Airline Travel Information Systems) is a dataset consisting of audio recordings and corresponding manual transcripts about humans asking for flight information on automated airline travel inquiry systems. The data consists of 17 unique intent categories. The original split contains 4478, 500 and 893 intent-labeled reference utterances in train, development and test set respectively.

263 PAPERS • 7 BENCHMARKS

SNIPS (SNIPS Natural Language Understanding benchmark)

The SNIPS Natural Language Understanding benchmark is a dataset of over 16,000 crowdsourced queries distributed among 7 user intents of various complexity:

244 PAPERS • 6 BENCHMARKS

BANKING77

Dataset composed of online banking queries annotated with their corresponding intents.

98 PAPERS • 5 BENCHMARKS

CLINC150

This dataset is for evaluating the performance of intent classification systems in the presence of "out-of-scope" queries, i.e., queries that do not fall into any of the system-supported intent classes. The dataset includes both in-scope and out-of-scope data.

73 PAPERS • 5 BENCHMARKS

HWU64

This project contains natural language data for human-robot interaction in home domain which we collected and annotated for evaluating NLU Services/platforms.

59 PAPERS • 3 BENCHMARKS

MixATIS

Dataset is constructed from single intent dataset ATIS.

24 PAPERS • 3 BENCHMARKS

SVIRO

SVIRO (Synthetic Vehicle Interior Rear Seat Occupancy Dataset)

Contains bounding boxes for object detection, instance segmentation masks, keypoints for pose estimation and depth images for each synthetic scenery as well as images for each individual seat for classification.

14 PAPERS • NO BENCHMARKS YET

HINT3

HINT3 is a dataset for intent detection. It consists of 3 different datasets each containing a diverse set of intents in a single domain - mattress products retail, fitness supplements retail and online gaming named SOFMattress, Curekart and Powerplay11.

10 PAPERS • NO BENCHMARKS YET

SPEECH-COCO

SPEECH-COCO contains speech captions that are generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (more than 600h) paired with images.

9 PAPERS • NO BENCHMARKS YET

CAIS

CAIS (Chinese Artificial Intelligence Speakers)

We collect utterances from the Chinese Artificial Intelligence Speakers (CAIS), and annotate them with slot tags and intent labels. The training, validation and test sets are split by the distribution of intents, where detailed statistics are provided in the supplementary material. Since the utterances are collected from speaker systems in the real world, intent labels are partial to the PlayMusic option. We adopt the BIOES tagging scheme for slots instead of the BIO2 used in the ATIS, since previous studies have highlighted meaningful improvements with this scheme (Ratinov and Roth, 2009) in the sequence labeling field

6 PAPERS • 2 BENCHMARKS

BANKING77-OOS

A dataset with a single banking domain, includes both general Out-of-Scope (OOD-OOS) queries and In-Domain but Out-of-Scope (ID-OOS) queries, where ID-OOS queries are semantically similar intents/queries with in-scope intents. BANKING77 originally includes 77 intents. BANKING77-OOS includes 50 in-scope intents in this dataset, and the ID-OOS queries are built up based on 27 held-out in-scope intents.

5 PAPERS • NO BENCHMARKS YET

ATIS (vi)

ATIS (vi) (Vietnamese Intent Detection and Slot Filling)

This is a dataset for intent detection and slot filling for the Vietnamese language. The dataset consists of 5,871 gold annotated utterances with 28 intent labels and 82 slot types.

4 PAPERS • 3 BENCHMARKS

MDID (Multimodal Document Intent Dataset)

The Multimodal Document Intent Dataset (MDID) is a dataset for computing author intent from multimodal data from Instagram. It contains 1,299 Instagram posts covering a variety of topics, annotated with labels from three taxonomies. The samples are labelled with 7 labels of intent: Provocative, Informative, Advocative, Entertainment, Expositive, Expressive, Promotive

3 PAPERS • NO BENCHMARKS YET

Almawave-SLU

Almawave-SLU is the first Italian dataset for Spoken Language Understanding (SLU). It is derived through a semi-automatic procedure and is used as a benchmark of various open source and commercial systems.

2 PAPERS • NO BENCHMARKS YET

CLINC-Single-Domain-OOS

A dataset with two separate domains, i.e., the "Banking'' domain and the "Credit cards'' domain with both general Out-of-Scope (OOD-OOS) queries and In-Domain but Out-of-Scope (ID-OOS) queries, where ID-OOS queries are semantically similar intents/queries with in-scope intents. Each domain in CLINC150 originally includes 15 intents. Each domain includes ten in-scope intents in this dataset, and the ID-OOS queries are built up based on five held-out in-scope intents.

2 PAPERS • NO BENCHMARKS YET

ClarQ

ClarQ, consists of ∼2M examples distributed across 173 domains of stackexchange. This dataset is meant for training and evaluation of Clarification Question Generation Systems.

2 PAPERS • NO BENCHMARKS YET

DialogUSR

DialogUSR dataset covers 23 domains with a multi-step crowd-sourcing procedure. It comprises 36.7 Chinese characters by assembling 3.6 single-intent queries (including initial and follow-up queries) and is designed for dialogue utterance splitting and reformulation task.

2 PAPERS • NO BENCHMARKS YET

ITALIC

ITALIC: An ITALian Intent Classification Dataset

2 PAPERS • NO BENCHMARKS YET

Persian-ATIS

The PATIS is a Persian language dataset for intent detection and slot filling.

2 PAPERS • 2 BENCHMARKS

ProSLU

ProSLU (Profile-based Spoken Language Understanding)

In the paper, to bridge the research gap, we propose a new and important task, Profile-based Spoken Language Understanding (ProSLU), which requires a model not only depends on the text but also on the given supporting profile information. We further introduce a Chinese human-annotated dataset, with over 5K utterances annotated with intent and slots, and corresponding supporting profile information. In total, we provide three types of supporting profile information: (1) Knowledge Graph (KG) consists of entities with rich attributes, (2) User Profile (UP) is composed of user settings and information, (3) Context Awareness(CA) is user state and environmental information.

2 PAPERS • 3 BENCHMARKS

Datasets

20 dataset results for Intent Detection