Natural Language Processing

Dialect Identification

32 papers with code • 0 benchmarks • 3 datasets

Dialectal Arabic Identification

Benchmarks

Add a Result

These leaderboards are used to track progress in Dialect Identification

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Most implemented papers

Most implemented Social Latest No code

GlotLID: Language Identification for Low-Resource Languages

cisnlp/glotlid • 24 Oct 2023

Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages.

Paper
Code

AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech Detection Dataset

MohamedHadjAmeur/AraCOVID19-MFH • 7 May 2021

This paper releases "AraCOVID19-MFH" a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset.

Paper
Code

Automatic Dialect Detection in Arabic Broadcast Speech

Qatar-Computing-Research-Institute/dialectID • 23 Sep 2015

We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%.

Paper
Code

Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource

clips/dutchembeddings • LREC 2016

With this research, we provide the embeddings themselves, the relation evaluation task benchmark for use in further research, and demonstrate how the benchmarked embeddings prove a useful unsupervised linguistic resource, effectively used in a downstream task.

Paper
Code

A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects

boknilev/dsl-char-cnn • • WS 2016

Discriminating between closely-related language varieties is considered a challenging and important task.

Paper
Code

Speech Recognition Challenge in the Wild: Arabic MGB-3

qcri/dialectID • 21 Sep 2017

Two hours of audio per dialect were released for development and a further two hours were used for evaluation.

Paper
Code

CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing

CAMeL-Lab/camel_tools • • LREC 2020

We present CAMeL Tools, a collection of open-source tools for Arabic natural language processing in Python.

Paper
Code

Multi-Dialect Arabic BERT for Country-Level Dialect Identification

mawdoo3/Multi-dialect-Arabic-BERT • COLING (WANLP) 2020

Our winning solution itself came in the form of an ensemble of different training iterations of our pre-trained BERT model, which achieved a micro-averaged F1-score of 26. 78% on the subtask at hand.

Paper
Code

The Unreasonable Effectiveness of Machine Learning in Moldavian versus Romanian Dialect Identification

raduionescu/MOROCO-Tweets • 30 Jul 2020

We conduct a subjective evaluation by human annotators, showing that humans attain much lower accuracy rates compared to machine learning (ML) models.

Paper
Code

Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments

UBC-NLP/microdialects • EMNLP 2020

Although the prediction of dialects is an important language processing task, with a wide range of applications, existing work is largely limited to coarse-grained varieties.

Paper
Code

Dialect Identification

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result