MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)

WS 2019  ·  Dhaou Ghoul, Ga{\"e}l Lejeune ·

We present MICHAEL, a simple lightweight method for automatic Arabic Dialect Identification on the MADAR travel domain Dialect Identification (DID). MICHAEL uses simple character-level features in order to perform a pre-processing free classification. More precisely, Character N-grams extracted from the original sentences are used to train a Multinomial Naive Bayes classifier. This system achieved an official score (accuracy) of 53.25{\%} with 1{\textless}=N{\textless}=3 but showed a much better result with character 4-grams (62.17{\%} accuracy).

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here