5 dataset results for Spoken Language Understanding AND Audio

Fluent Speech Commands is an open source audio dataset for spoken language understanding (SLU) experiments. Each utterance is labeled with "action", "object", and "location" values; for example, "turn the lights on in the kitchen" has the label {"action": "activate", "object": "lights", "location": "kitchen"}. A model must predict each of these values, and a prediction for an utterance is deemed to be correct only if all values are correct.

52 PAPERS • 1 BENCHMARK

Snips-SmartLights

The SmartLights benchmark from Snipstests the capability of controlling lights in different rooms. It consists of 1660 requests which are split into five partitions for a 5-fold evaluation. A sample command could be: “please change the [bedroom] lights to [red]” or “i’d like the [living room] lights to be at [twelve] percent”

8 PAPERS • 1 BENCHMARK

MEDIA

The MEDIA French corpus is dedicated to semantic extraction from speech in a context of human/machine dialogues. The corpus has manual transcription and conceptual annotation of dialogues from 250 speakers. It is split into the following three parts : (1) the training set (720 dialogues, 12K sentences), (2) the development set (79 dialogues, 1.3K sentences, and (3) the test set (200 dialogues, 3K sentences).

6 PAPERS • NO BENCHMARKS YET

Snips-SmartSpeaker

The SmartSpeaker benchmark tests the performance of reacting to music player commands in English as well as in French. It has the difficulty of containing many artist or music tracks with uncommon names in the commands, like “play music by [a boogie wit da hoodie]” or “I’d like to listen to [Kinokoteikoku]”.

3 PAPERS • 1 BENCHMARK

Skit-S2I

Skit-S2I (Skit-S2I: An Indian Accented Speech to Intent dataset)

This dataset for Intent classification from human speech covers 14 coarse-grained intents from the Banking domain. This work is inspired by a similar release in the Minds-14 dataset - here, we restrict ourselves to Indian English but with a much larger training set. The data was generated by 11 (Indian English) speakers and recorded over a telephony line. We also provide access to anonymized speaker information - like gender, languages spoken, and native language - to allow more structured discussions around robustness and bias in the models you train.

1 PAPER • 1 BENCHMARK

Datasets

5 dataset results for Spoken Language Understanding AND Audio