A scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata.
8 PAPERS • NO BENCHMARKS YET
FacetSum is a faceted summarization dataset for scientific documents. FacetSum has been built on Emerald journal articles, covering a diverse range of domains. Different from traditional document-summary pairs, FacetSum provides multiple summaries, each targeted at specific sections of a long document, including the purpose, method, findings, and value.
3 PAPERS • 1 BENCHMARK
MS^2 (Multi-Document Summarization of Medical Studies) is a dataset of over 470k documents and 20k summaries derived from the scientific literature. This dataset facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is one of the first large-scale, publicly available multi-document summarization dataset in the biomedical domain.