Document Summarization

195 papers with code • 7 benchmarks • 28 datasets

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Libraries

Use these libraries to find Document Summarization models and implementations

Investigating Text Shortening Strategy in BERT: Truncation vs Summarization

mirzaalimm/truncationvssummarization 19 Mar 2024

In this study, we investigate the performance of document truncation and summarization in text classification tasks.

1
19 Mar 2024

Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis

microsoft/comphrdoc 22 Jan 2024

Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets (PubLayNet and DocLayNet), a high-quality hierarchical document structure reconstruction dataset (HRDoc), and our Comp-HRDoc benchmark.

6
22 Jan 2024

Shaping Political Discourse using multi-source News Summarization

c-rajan/Multi-source-NewsSummarization 18 Dec 2023

Multi-document summarization is the process of automatically generating a concise summary of multiple documents related to the same topic.

0
18 Dec 2023

OpenAsp: A Benchmark for Multi-document Open Aspect-based Summarization

liatschiff/openasp 7 Dec 2023

To advance research on more realistic scenarios, we introduce OpenAsp, a benchmark for multi-document \textit{open} aspect-based summarization.

4
07 Dec 2023

Supervising the Centroid Baseline for Extractive Multi-Document Summarization

priberam/cera-summ 29 Nov 2023

The centroid method is a simple approach for extractive multi-document summarization and many improvements to its pipeline have been proposed.

2
29 Nov 2023

Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

salesforce/diversesumm 17 Sep 2023

In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event.

3
17 Sep 2023

ODSum: New Benchmarks for Open Domain Multi-Document Summarization

yale-nlp/odsum 16 Sep 2023

Open-domain Multi-Document Summarization (ODMDS) is a critical tool for condensing vast arrays of documents into coherent, concise summaries.

10
16 Sep 2023

Gender Bias in News Summarization: Measures, Pitfalls and Corpora

julmaxi/summary_bias 14 Sep 2023

Summarization is an important application of large language models (LLMs).

0
14 Sep 2023

Extending Context Window of Large Language Models via Positional Interpolation

pku-yuangroup/open-sora-plan 27 Jun 2023

We present Position Interpolation (PI) that extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps), while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7B to 65B.

10,064
27 Jun 2023