Multi-Document Summarization
93 papers with code • 5 benchmarks • 15 datasets
Multi-Document Summarization is a process of representing a set of documents with a short piece of text by capturing the relevant information and filtering out the redundant information. Two prominent approaches to Multi-Document Summarization are extractive and abstractive summarization. Extractive summarization systems aim to extract salient snippets, sentences or passages from documents, while abstractive summarization systems aim to concisely paraphrase the content of the documents.
Source: Multi-Document Summarization using Distributed Bag-of-Words Model
Datasets
Latest papers
PeerSum: A Peer Review Dataset for Abstractive Multi-document Summarization
We present PeerSum, a new MDS dataset using peer reviews of scientific publications.
Proposition-Level Clustering for Multi-Document Summarization
Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.
Proposition-Level Clustering for Multi-Document Summarization
Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.
LongT5: Efficient Text-To-Text Transformer for Long Sequences
Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models.
PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data.
SgSum: Transforming Multi-document Summarization into Sub-graph Selection
Comparing with traditional methods, our method has two main advantages: (1) the relations between sentences are captured by modeling both the graph structure of the whole document set and the candidate sub-graphs; (2) directly outputs an integrate summary in the form of sub-graph which is more informative and coherent.
PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization
We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data.
Modeling Endorsement for Multi-Document Abstractive Summarization
In this paper, we model the cross-document endorsement effect and its utilization in multiple document summarization.
HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization
To capture the semantic graph structure from raw text, most existing summarization approaches are built on GNNs with a pre-trained model.
Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations
NLP models that compare or consolidate information across multiple documents often struggle when challenged with recognizing substantial information redundancies across the texts.