The AmbigNQ dataset is a valuable resource for exploring ambiguity in open-domain question answering. Let me provide you with some details:

  1. Task Description:
  2. Ambiguity is inherent in open-domain question answering, especially when dealing with new topics. It can be challenging to formulate questions that have a single, unambiguous answer.
  3. The AmbigQA task involves predicting a set of question-answer pairs, where each plausible answer is paired with a disambiguated rewrite of the original question.

  4. Dataset Construction:

  5. To study this task, the researchers constructed the AmbigNQ dataset.
  6. AmbigNQ covers 14,042 questions from NQ-open, which is an existing open-domain QA benchmark.
  7. Surprisingly, over half of the questions in NQ-open exhibit ambiguity.
  8. The types of ambiguity are diverse and sometimes subtle, often becoming apparent only after examining evidence provided by a very large text corpus.

  9. Dataset Versions:

  10. There are three versions of the AmbigNQ dataset:
    • Light Version: Contains only inputs and outputs.
    • Full Version: Includes all annotation metadata.
    • Evidence Version: Provides semi-oracle evidence articles along with questions and answers.

(1) AmbigQA - University of Washington. https://nlp.cs.washington.edu/ambigqa/. (2) ambig_qa.py · ambig_qa at main - Hugging Face. https://huggingface.co/datasets/ambig_qa/blob/main/ambig_qa.py. (3) dataset_infos.json · ambig_qa at main - Hugging Face. https://huggingface.co/datasets/ambig_qa/blob/main/dataset_infos.json. (4) AmbigQA/AmbigNQ README - GitHub: Let’s build from here. https://github.com/shmsw25/AmbigQA.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages