MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)
We present MICHAEL, a simple lightweight method for automatic Arabic Dialect Identification on the MADAR travel domain Dialect Identification (DID). MICHAEL uses simple character-level features in order to perform a pre-processing free classification. More precisely, Character N-grams extracted from the original sentences are used to train a Multinomial Naive Bayes classifier. This system achieved an official score (accuracy) of 53.25{\%} with 1{\textless}=N{\textless}=3 but showed a much better result with character 4-grams (62.17{\%} accuracy).
PDF Abstract