OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification

WS 2018  ·  Sowmya Vajjala, Ivana Lu{\v{c}}i{\'c} ·

This paper describes the collection and compilation of the OneStopEnglish corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification. The corpus consists of 189 texts, each in three versions (567 in total). The corpus is now freely available under a CC by-SA 4.0 license and we hope that it would foster further research on the topics of readability assessment and text simplification.

PDF Abstract

Datasets


Introduced in the Paper:

OneStopEnglish
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Text Classification OneStopEnglish (Readability Assessment) SMO (Sequential Minimal Optimization) Accuracy (5-fold) 0.781 # 4

Methods


No methods listed for this paper. Add relevant methods here