HotpotQA is a question answering dataset collected on the English Wikipedia, containing about 113K crowd-sourced questions that are constructed to require the introduction paragraphs of two Wikipedia articles to answer. Each question in the dataset comes with the two gold paragraphs, as well as a list of sentences in these paragraphs that crowdworkers identify as supporting facts necessary to answer the question.
A diverse range of reasoning strategies are featured in HotpotQA, including questions involving missing entities in the question, intersection questions (What satisfies property A and property B?), and comparison questions, where two entities are compared by a common attribute, among others. In the few-document distractor setting, the QA models are given ten paragraphs in which the gold paragraphs are guaranteed to be found; in the open-domain fullwiki setting, the models are only given the question and the entire Wikipedia. Models are evaluated on their answer accuracy and explainability, where the former is measured as overlap between the predicted and gold answers with exact match (EM) and unigram F1, and the latter concerns how well the predicted supporting fact sentences match human annotation (Supporting Fact EM/F1). A joint metric is also reported on this dataset, which encourages systems to perform well on both tasks simultaneously.Source: Answering Complex Open-domain Questions Through Iterative Query Generation