Dialect Identification
34 papers with code • 0 benchmarks • 3 datasets
Dialectal Arabic Identification
Benchmarks
These leaderboards are used to track progress in Dialect Identification
Most implemented papers
GlotLID: Language Identification for Low-Resource Languages
Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages.
AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech Detection Dataset
This paper releases "AraCOVID19-MFH" a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset.
Automatic Dialect Detection in Arabic Broadcast Speech
We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.
Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource
With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.
A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects
Discriminating between closely-related language varieties is considered a challenging and important task.
Speech Recognition Challenge in the Wild: Arabic MGB-3
Two hours of audio per dialect were released for development and a further two hours were used for evaluation.
CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing
We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python.
Multi-Dialect Arabic BERT for Country-Level Dialect Identification
Our winning solution itself came in the form of an ensemble of different training iterations of our pre-trained BERT model, which achieved a micro-averaged F1-score of 26. 78% on the subtask at hand.
The Unreasonable Effectiveness of Machine Learning in Moldavian versus Romanian Dialect Identification
We conduct a subjective evaluation by human annotators, showing that humans attain much lower accuracy rates compared to machine learning (ML) models.
Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments
Although the prediction of dialects is an important language processing task, with a wide range of applications, existing work is largely limited to coarse-grained varieties.