We consider the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Our proposed solution trains models to produce well calibrated confidence scores for their results on individual paragraphs. We sample multiple paragraphs from the documents during training, and use a shared-normalization training objective that encourages the model to produce globally correct output.
|Task||Dataset||Model||Metric name||Metric value||Global rank||Compare|
|Question Answering||SQuAD1.1||BiDAF + Self Attention (single model)||EM||72.139||# 106|
|Question Answering||SQuAD1.1||BiDAF + Self Attention (single model)||F1||81.048||# 106|
|Question Answering||TriviaQA||S-Norm||EM||66.37||# 1|
|Question Answering||TriviaQA||S-Norm||F1||71.32||# 1|