BillSum: A Corpus for Automatic Summarization of US Legislation

WS 2019  ·  Anastassia Kornilova, Vlad Eidelman ·

Automatic summarization methods have been studied on a variety of domains, including news and scientific articles. Yet, legislation has not previously been considered for this task, despite US Congress and state governments releasing tens of thousands of bills every year. In this paper, we introduce BillSum, the first dataset for summarization of US Congressional and California state bills (https://github.com/FiscalNote/BillSum). We explain the properties of the dataset that make it more challenging to process than other domains. Then, we benchmark extractive methods that consider neural sentence representations and traditional contextual features. Finally, we demonstrate that models built on Congressional bills can be used to summarize California bills, thus, showing that methods developed on this dataset can transfer to states without human-written summaries.

PDF Abstract WS 2019 PDF WS 2019 Abstract

Datasets


Introduced in the Paper:

BillSum

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Text Summarization BillSum Longformer Encoder Decoder rouge1 38.650 # 1

Methods


No methods listed for this paper. Add relevant methods here