Simple But Not Na\"\ive: Fine-Grained Arabic Dialect Identification Using Only N-Grams

WS 2019  ·  Sohaila Eltanbouly, May Bashendy, Tamer Elsayed ·

This paper presents the participation of Qatar University team in MADAR shared task, which addresses the problem of sentence-level fine-grained Arabic Dialect Identification over 25 different Arabic dialects in addition to the Modern Standard Arabic. Arabic Dialect Identification is not a trivial task since different dialects share some features, e.g., utilizing the same character set and some vocabularies. We opted to adopt a very simple approach in terms of extracted features and classification models; we only utilize word and character n-grams as features, and Na ̈{\i}ve Bayes models as classifiers. Surprisingly, the simple approach achieved non-na ̈{\i}ve performance. The official results, reported on a held-out testing set, show that the dialect of a given sentence can be identified at an accuracy of 64.58{\%} by our best submitted run.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here