FanOutQA is a high quality, multi-hop, multi-document benchmark for large language models using English Wikipedia as its knowledge base. Compared to other question-answering benchmarks, FanOutQA requires reasoning over a greater number of documents, with the benchmark's main focus being on the titular fan-out style of question. We present these questions in three tasks -- closed-book, open-book, and evidence-provided -- which measure different abilities of LLM systems.
Paper | Code | Results | Date | Stars |
---|