Learning to Generate Questions by Recovering Answer-containing Sentences

1 Jan 2021  ·  Seohyun Back, Akhil Kedia, Sai Chetan Chinthakindi, Haejun Lee, Jaegul Choo ·

To train a question answering model based on machine reading comprehension (MRC), significant effort is required to prepare annotated training data composed of questions and their answers from contexts. To mitigate this issue, recent research has focused on synthetically generating a question from a given context and an annotated (or generated) answer by training an additional generative model, which can be utilized to augment the training data. In light of this research direction, we propose a novel pre-training approach that learns to generate contextually rich questions, by recovering answer-containing sentences. Our approach is composed of two novel components, (1) dynamically determining K answers from a given document and (2) pre-training the question generator on the task of generating the answer-containing sentence. We evaluate our method against existing ones in terms of the quality of generated questions as well as the fine-tuned MRC model accuracy after training on the data synthetically generated by our method. Experimental results demonstrate that our approach consistently improves the question generation capability of existing models such as UniLM, and shows state-of-the-art results on MS MARCO and NewsQA, and comparable results to the state-of-the-art on SQuAD. Additionally, we demonstrate that the data synthetically generated by our approach is beneficial for boosting up the downstream MRC accuracy across a wide range of datasets, such as SQuAD-v1.1, v2.0, and KorQuAD, without any modification to the existing MRC models. Furthermore, our experiments highlight that our method shines especially when a limited amount of training data is given, in terms of both pre-training and downstream MRC data.

PDF Abstract

Results from the Paper


Ranked #3 on Question Generation on SQuAD1.1 (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Question Generation SQuAD1.1 ProphetNet + ASGen BLEU-4 24.44 # 3
METEOR 26.73 # 1
ROUGE-L 52.8 # 1
Question Generation SQuAD1.1 UniLM + ASGen BLEU-4 23.7 # 7
METEOR 25.9 # 5
ROUGE-L 52.3 # 4

Methods


No methods listed for this paper. Add relevant methods here