7 dataset results for Text Simplification AND Texts AND English

WikiLarge comprise 359 test sentences, 2000 development sentences and 300k training sentences. Each source sentences in test set has 8 simplified references

65 PAPERS • NO BENCHMARKS YET

TurkCorpus

TurkCorpus, a dataset with 2,359 original sentences from English Wikipedia, each with 8 manual reference simplifications. The dataset is divided into two subsets: 2,000 sentences for validation and 359 for testing of sentence simplification models.

43 PAPERS • 1 BENCHMARK

CEFR-SP

CEFR-SP contains 17k English sentences annotated with the levels based on the Common European Framework of Reference for Languages assigned by English-education professionals.

4 PAPERS • NO BENCHMARKS YET

TextBox 2.0

TextBox 2.0 is a comprehensive and unified library for text generation, focusing on the use of pre-trained language models (PLMs). The library covers 13 common text generation tasks and their corresponding 83 datasets and further incorporates 45 PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs.

2 PAPERS • NO BENCHMARKS YET

InfoLossQA

The goal of InfoLossQA is to generate a series of QA pairs that reveal to lay readers what information a simplified text lacks compared to its original.

1 PAPER • NO BENCHMARKS YET

Medical Wiki Paralell Corpus for Medical Text Simplification

A medical Wiki paralell corpus for medical text simplification.

1 PAPER • NO BENCHMARKS YET

SimpEvalASSET

SimpEvalASSET is a dataset for learning learnable metrics using modern language models. It comprises of 12K human ratings on 2.4K simplifications of 24 systems, and SIMPEVAL_2022, a challenging simplification benchmark consisting of over 1K human ratings of 360 simplifications including generations from GPT-3.5.

1 PAPER • NO BENCHMARKS YET

Datasets

7 dataset results for Text Simplification AND Texts AND English