Dialect Identification
32 papers with code • 0 benchmarks • 3 datasets
Dialectal Arabic Identification
Benchmarks
These leaderboards are used to track progress in Dialect Identification
Latest papers
Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data
Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects.
ArTST: Arabic Text and Speech Transformer
We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language.
GlotLID: Language Identification for Low-Resource Languages
Several recent papers have published good solutions for language identification (LID) for about 300 high-resource and medium-resource languages.
Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification
Automatic Arabic Dialect Identification (ADI) of text has gained great popularity since it was introduced in the early 2010s.
ALDi: Quantifying the Arabic Level of Dialectness of Text
Transcribed speech and user-generated text in Arabic typically contain a mixture of Modern Standard Arabic (MSA), the standardized language taught in schools, and Dialectal Arabic (DA), used in daily communications.
RoDia: A New Dataset for Romanian Dialect Identification from Speech
We introduce RoDia, the first dataset for Romanian dialect identification from speech.
DADA: Dialect Adaptation via Dynamic Aggregation of Linguistic Rules
We show that DADA is effective for both single task and instruction finetuned language models, offering an extensible and interpretable framework for adapting existing LLMs to different English dialects.
North Sámi Dialect Identification with Self-supervised Speech Models
The North S\'{a}mi (NS) language encapsulates four primary dialectal variants that are related but that also have differences in their phonology, morphology, and vocabulary.
A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model
In this work, we explore Parameter-Efficient-Learning (PEL) techniques to repurpose a General-Purpose-Speech (GSM) model for Arabic dialect identification (ADI).
Two-stage Pipeline for Multilingual Dialect Detection
Our proposed approach consists of a two-stage system and outperforms other participants' systems and previous works in this domain.