Hitachi at SemEval-2020 Task 11: An Empirical Study of Pre-Trained Transformer Family for Propaganda Detection

SEMEVAL 2020 · Gaku Morio, Terufumi Morishita, Hiroaki Ozaki, Toshinori Miyoshi ·

In this paper, we show our system for SemEval-2020 task 11, where we tackle propaganda span identification (SI) and technique classification (TC). We investigate heterogeneous pre-trained language models (PLMs) such as BERT, GPT-2, XLNet, XLM, RoBERTa, and XLM-RoBERTa for SI and TC fine-tuning, respectively. In large-scale experiments, we found that each of the language models has a characteristic property, and using an ensemble model with them is promising. Finally, the ensemble model was ranked 1st amongst 35 teams for SI and 3rd amongst 31 teams for TC.

PDF Abstract