Denoising Distantly Supervised Open-Domain Question Answering

ACL 2018  ·  Yankai Lin, Haozhe Ji, Zhiyuan Liu, Maosong Sun ·

Distantly supervised open-domain question answering (DS-QA) aims to find answers in collections of unlabeled text. Existing DS-QA models usually retrieve related paragraphs from a large-scale corpus and apply reading comprehension technique to extract answers from the most relevant paragraph. They ignore the rich information contained in other paragraphs. Moreover, distant supervision data inevitably accompanies with the wrong labeling problem, and these noisy data will substantially degrade the performance of DS-QA. To address these issues, we propose a novel DS-QA model which employs a paragraph selector to filter out those noisy paragraphs and a paragraph reader to extract the correct answer from those denoised paragraphs. Experimental results on real-world datasets show that our model can capture useful information from noisy data and achieve significant improvements on DS-QA as compared to all baselines.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Open-Domain Question Answering Quasar Denoising QA EM (Quasar-T) 42.2 # 2
F1 (Quasar-T) 49.3 # 2

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
Question Answering Quasart-T Denoising QA EM 42.2 # 5
Open-Domain Question Answering SearchQA Denoising QA Unigram Acc - # 6
N-gram F1 - # 6
EM 58.8 # 6
F1 64.5 # 2

Methods


No methods listed for this paper. Add relevant methods here