Training large-scale question answering systems is complicated because training sources usually cover a small portion of the range of possible questions. This paper studies the impact of multitask and transfer learning for simple question answering; a setting for which the reasoning required to answer is quite easy, as long as one can retrieve the correct evidence given a question, which can be difficult in large-scale conditions. To this end, we introduce a new dataset of 100k questions that we use in conjunction with existing benchmarks.
|Task||Dataset||Model||Metric name||Metric value||Global rank||Compare|
|Question Answering||Reverb||Memory Networks (ensemble)||Accuracy||68%||# 2|
|Question Answering||SimpleQuestions||Memory Networks (ensemble)||F1||63.9%||# 1|
|Question Answering||WebQuestions||Memory Networks (ensemble)||F1||42.2%||# 1|