Multi-Document Summarization
93 papers with code • 5 benchmarks • 15 datasets
Multi-Document Summarization is a process of representing a set of documents with a short piece of text by capturing the relevant information and filtering out the redundant information. Two prominent approaches to Multi-Document Summarization are extractive and abstractive summarization. Extractive summarization systems aim to extract salient snippets, sentences or passages from documents, while abstractive summarization systems aim to concisely paraphrase the content of the documents.
Source: Multi-Document Summarization using Distributed Bag-of-Words Model
Datasets
Latest papers
XWikiGen: Cross-lingual Summarization for Encyclopedic Text Generation in Low Resource Languages
But, for low-resource languages, the scarcity of reference articles makes monolingual summarization ineffective in solving this problem.
Compressed Heterogeneous Graph for Abstractive Multi-Document Summarization
We propose HGSUM, an MDS model that extends an encoder-decoder architecture, to incorporate a heterogeneous graph to represent different semantic units (e. g., words and sentences) of the documents.
PDSum: Prototype-driven Continuous Summarization of Evolving Multi-document Sets Stream
Summarizing text-rich documents has been long studied in the literature, but most of the existing efforts have been made to summarize a static and predefined multi-document set.
Generating a Structured Summary of Numerous Academic Papers: Dataset and Method
Existing MDS datasets usually focus on producing the structureless summary covering a few input documents.
SumREN: Summarizing Reported Speech about Events in News
A primary objective of news articles is to establish the factual record for an event, frequently achieved by conveying both the details of the specified event (i. e., the 5 Ws; Who, What, Where, When and Why regarding the event) and how people reacted to it (i. e., reported statements).
How "Multi" is Multi-Document Summarization?
To that end, we propose an automated measure for evaluating the degree to which a summary is ``disperse'', in the sense of the number of source documents needed to cover its content.
Multi-Document Summarization with Centroid-Based Pretraining
In Multi-Document Summarization (MDS), the input can be modeled as a set of documents, and the output is its summary.
Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities
With the advent of large language models, methods for abstractive summarization have made great strides, creating potential for use in applications to aid knowledge workers processing unwieldy document collections.
Improving Multi-Document Summarization through Referenced Flexible Extraction with Credit-Awareness
A notable challenge in Multi-Document Summarization (MDS) is the extremely-long length of the input.
A Multi-Document Coverage Reward for RELAXed Multi-Document Summarization
Multi-document summarization (MDS) has made significant progress in recent years, in part facilitated by the availability of new, dedicated datasets and capacious language models.