Question answering (QA) tasks have been posed using a variety of formats, such as extractive span selection, multiple choice, etc. This has led to format-specialized models, and even to an implicit division in the QA community. We argue that such boundaries are artificial and perhaps unnecessary, given the reasoning abilities we seek to teach are not governed by the format. As evidence, we use the latest advances in language modeling to build a single pre-trained QA model, UnifiedQA, that performs surprisingly well across 17 QA datasets spanning 4 diverse formats. UnifiedQA performs on par with 9 different models that were trained on individual datasets themselves. Even when faced with 12 unseen datasets of observed formats, UnifiedQA performs surprisingly well, showing strong generalization from its out-of-format training data. Finally, simply fine-tuning this pre-trained QA model into specialized models results in a new state of the art on 6 datasets, establishing UnifiedQA as a strong starting point for building QA systems.

PDF Abstract Findings of 2020 PDF Findings of 2020 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Common Sense Reasoning CommonsenseQA UnifiedQA 11B (zero-shot) Accuracy 76.2 # 11
Common Sense Reasoning CommonsenseQA T5-XXL 11B (fine-tuned) Accuracy 78.1 # 9
Common Sense Reasoning CommonsenseQA UnifiedQA 440M (fine-tuned) Accuracy 64 # 24
Common Sense Reasoning CommonsenseQA BART-large 440M (fine-tuned) Accuracy 62.5 # 25
Multi-task Language Understanding MMLU UnifiedQA 11B Average (%) 48.9 # 62
Common Sense Reasoning WinoGrande UnifiedQA 11B (fine-tuned) Accuracy 89.4 # 6
Common Sense Reasoning WinoGrande Unified QA 406M (fine-tuned) Accuracy 73.3 # 29

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Uses Extra
Training Data
Source Paper Compare
Common Sense Reasoning CommonsenseQA UnifiedQA 11B (fine-tuned) Accuracy 79.1 # 7
Question Answering OpenBookQA UnifiedQA 11B Accuracy 87.2 # 13
Question Answering PIQA UnifiedQA 3B Accuracy 85.3 # 7
Question Answering SIQA UnifiedQA 3B Accuracy 79.8 # 6

Methods


No methods listed for this paper. Add relevant methods here