Multi-Document Summarization
93 papers with code • 5 benchmarks • 15 datasets
Multi-Document Summarization is a process of representing a set of documents with a short piece of text by capturing the relevant information and filtering out the redundant information. Two prominent approaches to Multi-Document Summarization are extractive and abstractive summarization. Extractive summarization systems aim to extract salient snippets, sentences or passages from documents, while abstractive summarization systems aim to concisely paraphrase the content of the documents.
Source: Multi-Document Summarization using Distributed Bag-of-Words Model
Datasets
Most implemented papers
PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data.
Proposition-Level Clustering for Multi-Document Summarization
Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.
A General Optimization Framework for Multi-Document Summarization Using Genetic Algorithms and Swarm Intelligence
Extracting summaries via integer linear programming and submodularity are popular and successful techniques in extractive multi-document summarization.
Bridging the gap between extractive and abstractive summaries: Creation and evaluation of coherent extracts from heterogeneous sources
Coherent extracts are a novel type of summary combining the advantages of manually created abstractive summaries, which are fluent but difficult to evaluate, and low-quality automatically created extractive summaries, which lack coherence and structure.
The Next Step for Multi-Document Summarization: A Heterogeneous Multi-Genre Corpus Built with a Novel Construction Approach
In a detailed analysis, we show that our new corpus is significantly different from the homogeneous corpora commonly used, and that it is heterogeneous along several dimensions.
Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps
Concept maps can be used to concisely represent important information and bring structure into large document collections.