FHAC at GermEval 2021: Identifying German toxic, engaging, and fact-claiming comments with ensemble learning

The availability of language representations learned by large pretrained neural network models (such as BERT and ELECTRA) has led to improvements in many downstream Natural Language Processing tasks in recent years. Pretrained models usually differ in pretraining objectives, architectures, and datasets they are trained on which can affect downstream performance. In this contribution, we fine-tuned German BERT and German ELECTRA models to identify toxic (subtask 1), engaging (subtask 2), and fact-claiming comments (subtask 3) in Facebook data provided by the GermEval 2021 competition. We created ensembles of these models and investigated whether and how classification performance depends on the number of ensemble members and their composition. On out-of-sample data, our best ensemble achieved a macro-F1 score of 0.73 (for all subtasks), and F1 scores of 0.72, 0.70, and 0.76 for subtasks 1, 2, and 3, respectively.

PDF Abstract GermEval 2021 PDF GermEval 2021 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Classification of toxic, engaging, fact-claiming comments GermEval GBERT/GELECTRA Ensemble Macro-F1 72.7 # 1
Engaging Comment Classification GermEval 2021 - Engaging Comments test set GBERT/GELECTRA Ensemble F1 69.9 # 1
Fact-Claiming Comment Classification GermEval 2021 - Fact-Claiming Comments test set GBERT/GELECTRA Ensemble F1 76.8 # 1
Toxic Comment Classification GermEval 2021 - Toxic Comments test set GBERT/GELECTRA Ensemble F1 71.8 # 1

Methods