Document Summarization

128 papers with code • 4 benchmarks • 23 datasets

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Greatest papers with code

Language Models are Unsupervised Multitask Learners

huggingface/transformers Preprint 2019

Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets.

 Ranked #1 on Language Modelling on enwik8 (using extra training data)

Common Sense Reasoning Data-to-Text Generation +7

Generating Wikipedia by Summarizing Long Sequences

tensorflow/tensor2tensor ICLR 2018

We show that generating English Wikipedia articles can be approached as a multi- document summarization of source documents.

Document Summarization Extractive Summarization +1

Unified Language Model Pre-training for Natural Language Understanding and Generation

microsoft/unilm NeurIPS 2019

This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks.

Ranked #2 on Generative Question Answering on CoQA (using extra training data)

Abstractive Text Summarization Document Summarization +6

TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank

tensorflow/ranking 30 Nov 2018

We propose TensorFlow Ranking, the first open source library for solving large-scale ranking problems in a deep learning framework.

Document Summarization Learning-To-Rank +2

SgSum: Transforming Multi-document Summarization into Sub-graph Selection

PaddlePaddle/Research 25 Oct 2021

Comparing with traditional methods, our method has two main advantages: (1) the relations between sentences are captured by modeling both the graph structure of the whole document set and the candidate sub-graphs; (2) directly outputs an integrate summary in the form of sub-graph which is more informative and coherent.

Document Summarization Multi-Document Summarization

Leveraging Graph to Improve Abstractive Multi-Document Summarization

PaddlePaddle/Research ACL 2020

Graphs that capture relations between textual units have great benefits for detecting salient information from multiple documents and generating overall coherent summaries.

Document Summarization Multi-Document Summarization

Text Summarization with Pretrained Encoders

nlpyang/PreSumm IJCNLP 2019

For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not).

Abstractive Text Summarization Document-level +3

Extractive Summarization as Text Matching

maszhongming/MatchSum ACL 2020

This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.

Document Summarization Extractive Summarization +3

Ranking Sentences for Extractive Summarization with Reinforcement Learning

shashiongithub/Refresh NAACL 2018

In this paper we conceptualize extractive summarization as a sentence ranking task and propose a novel training algorithm which globally optimizes the ROUGE evaluation metric through a reinforcement learning objective.

Document Summarization Extractive Summarization +1

What is this Article about? Extreme Summarization with Topic-aware Convolutional Neural Networks

shashiongithub/XSum 19 Jul 2019

We introduce 'extreme summarization', a new single-document summarization task which aims at creating a short, one-sentence news summary answering the question ``What is the article about?''.

Document Summarization Extreme Summarization