Everybody likes short sentences - A Data Analysis for the Text Complexity DE Challenge 2022
The German Text Complexity Assessment Shared Task in KONVENS 2022 explores how to predict a complexity score for sentence examples from language learners’ perspective. Our modeling approach for this shared task utilizes off-the-shelf NLP tools for feature engineering and a Random Forest regression model. We identified the text length, or resp. the logarithm of a sentence’s string length, as the most important feature to predict the complexity score. Further analysis showed that the Pearson correlation between text length and complexity score is about \rho ≈ 0.777. A sensitivity analysis on the loss function revealed that semantic SBert features impact the complexity score as well.
PDF Abstract