No Army, No Navy: BERT Semi-Supervised Learning of Arabic Dialects

WS 2019 · Chiyu Zhang, Muhammad Abdul-Mageed ·

We present our deep leaning system submitted to MADAR shared task 2 focused on twitter user dialect identification. We develop tweet-level identification models based on GRUs and BERT in supervised and semi-supervised set-tings. We then introduce a simple, yet effective, method of porting tweet-level labels at the level of users. Our system ranks top 1 in the competition, with 71.70{\%} macro F1 score and 77.40{\%} accuracy.

PDF Abstract