16 papers with code • 0 benchmarks • 14 datasets
Dialectal Arabic Identification
These leaderboards are used to track progress in Dialect Identification
- 1,002 Hours – Changsha Dialect Speech Data by Mobile Phone
- 249 Hours - Hangzhou Dialect Speech Data by Mobile Phone
- 662 Hours - Mandarin Heavy Accent Speech Data by Mobile Phone
- 67 Hours - Northeast Dialect Speech Data by Mobile Phone
- 794 Hours - Sichuan Dialect Speech Data by Mobile Phone
- 997 Hours – Wuhan Dialect Speech Data by Mobile Phone
- 800 Hours - Sichuan Dialect Conversational Speech Data by Mobile Phone
- 505 Hours - Uyghur Colloquial Video Speech Data
- Cantonese Dialect Speech Data
- 500 Hours - Kazakh Colloquial Video Speech Data
This paper releases "AraCOVID19-MFH" a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset.
We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.
With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.
Discriminating between closely-related language varieties is considered a challenging and important task.
We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python.
Our winning solution itself came in the form of an ensemble of different training iterations of our pre-trained BERT model, which achieved a micro-averaged F1-score of 26. 78% on the subtask at hand.
The Unreasonable Effectiveness of Machine Learning in Moldavian versus Romanian Dialect Identification
We conduct a subjective evaluation by human annotators, showing that humans attain much lower accuracy rates compared to machine learning (ML) models.
Although the prediction of dialects is an important language processing task, with a wide range of applications, existing work is largely limited to coarse-grained varieties.
Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task
Tasks are to identify the geographic origin of short Dialectal (DA) and Modern Standard Arabic (MSA) utterances at the levels of both country and province.