MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages

We present the MASSIVE dataset--Multilingual Amazon Slu resource package (SLURP) for Slot-filling, Intent classification, and Virtual assistant Evaluation. MASSIVE contains 1M realistic, parallel, labeled virtual assistant utterances spanning 51 languages, 18 domains, 60 intents, and 55 slots. MASSIVE was created by tasking professional translators to localize the English-only SLURP dataset into 50 typologically diverse languages from 29 genera. We also present modeling results on XLM-R and mT5, including exact match accuracy, intent classification accuracy, and slot-filling F1 score. We have released our dataset, modeling code, and models publicly.

PDF Abstract

Datasets


Introduced in the Paper:

MASSIVE

Used in the Paper:

SLURP Fluent Speech Commands

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Slot Filling MASSIVE XLM-R Base Slot F1 Score 83.6 # 1
Zero-Shot Intent Classification and Slot Filling MASSIVE mT5 Base (text-to-text) Exact Match 42.8 # 2
Zero-Shot Intent Classification and Slot Filling MASSIVE mT5 Base (encoder-only) Exact Match 42.8 # 2
Zero-Shot Intent Classification and Slot Filling MASSIVE XLM-R Base Exact Match 52.9 # 1
Intent Classification and Slot Filling MASSIVE mT5 Base (text-to-text) Exact Match 73.8 # 3
Intent Classification and Slot Filling MASSIVE mT5 Base (encoder-only) Exact Match 74.7 # 2
Intent Classification and Slot Filling MASSIVE XLM-R Base Exact Match 75 # 1
Zero-shot Slot Filling MASSIVE mT5 Base (text-to-text) Slot F1 Score 50.6 # 3
Zero-Shot Intent Classification MASSIVE mT5 Base (text-to-text) Intent Accuracy 62.9 # 2
Zero-shot Slot Filling MASSIVE mT5 Base (encoder-only) Slot F1 Score 56.9 # 2
Zero-Shot Intent Classification MASSIVE mT5 Base (encoder-only) Intent Accuracy 61.2 # 3
Zero-shot Slot Filling MASSIVE XLM-R Base Slot F1 Score 64.2 # 1
Zero-Shot Intent Classification MASSIVE XLM-R Base Intent Accuracy 70.6 # 1
Slot Filling MASSIVE mT5 Base (text-to-text) Slot F1 Score 81.3 # 3
Intent Classification MASSIVE mT5 Base (text-to-text) Intent Accuracy 85.3 # 2
Slot Filling MASSIVE mT5 Base (encoder-only) Slot F1 Score 82.2 # 2
Intent Classification MASSIVE mT5 Base (encoder-only) Intent Accuracy 86.1 # 1
Intent Classification MASSIVE XLM-R Base Intent Accuracy 85.1 # 3

Methods