Fine-Grained Arabic Dialect Identification

Previous work on the problem of Arabic Dialect Identification typically targeted coarse-grained five dialect classes plus Standard Arabic (6-way classification). This paper presents the first results on a fine-grained dialect classification task covering 25 specific cities from across the Arab World, in addition to Standard Arabic {--} a very challenging task. We build several classification systems and explore a large space of features. Our results show that we can identify the exact city of a speaker at an accuracy of 67.9{\%} for sentences with an average length of 7 words (a 9{\%} relative error reduction over the state-of-the-art technique for Arabic dialect identification) and reach more than 90{\%} when we consider 16 words. We also report on additional insights from a data analysis of similarity and difference across Arabic dialects.

PDF Abstract COLING 2018 PDF COLING 2018 Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here