SSN\_NLP\_MLRG at SemEval-2020 Task 12: Offensive Language Identification in English, Danish, Greek Using BERT and Machine Learning Approach

SEMEVAL 2020 · A Kalaivani, Thenmozhi D. ·

Offensive language identification is to detect the hurtful tweets, derogatory comments, swear words on social media. As an emerging growth of social media communication, offensive language detection has received more attention in the last years; we focus to perform the task on English, Danish and Greek. We have investigated which can be effect more on pre-trained models BERT (Bidirectional Encoder Representation from Transformer) and Machine Learning Approaches. Our investigation shows the difference performance between the three languages and to identify the best performance is evaluated by the classification algorithms. In the shared task SemEval-2020, our team SSN{\_}NLP{\_}MLRG submitted for three languages that are Subtasks A, B, C in English, Subtask A in Danish and Subtask A in Greek. Our team SSN{\_}NLP{\_}MLRG obtained the F1 Scores as 0.90, 0.61, 0.52 for the Subtasks A, B, C in English, 0.56 for the Subtask A in Danish and 0.67 for the Subtask A in Greek respectively.

PDF Abstract