2 dataset results for Sentence Fusion AND Texts

Contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan et al. (2017) as a benchmark for this task.

21 PAPERS • 1 BENCHMARK

DiscoFuse

DiscoFuse was created by applying a rule-based splitting method on two corpora - sports articles crawled from the Web, and Wikipedia. See the paper for a detailed description of the dataset generation process and evaluation.

10 PAPERS • 1 BENCHMARK

Datasets

2 dataset results for Sentence Fusion AND Texts