Dialect Identification

32 papers with code • 0 benchmarks • 3 datasets

Dialectal Arabic Identification

Latest papers with no code

USTHB at NADI 2023 shared task: Exploring Preprocessing and Feature Engineering Strategies for Arabic Dialect Identification

no code yet • 16 Dec 2023

In this paper, we conduct an in-depth analysis of several key factors influencing the performance of Arabic Dialect Identification NADI'2023, with a specific focus on the first subtask involving country-level dialect identification.

Self-supervised Adaptive Pre-training of Multilingual Speech Models for Language and Dialect Identification

no code yet • 12 Dec 2023

To address this challenge, we propose self-supervised adaptive pre-training (SAPT) to adapt the pre-trained model to the target domain and languages of the downstream task.

Mavericks at NADI 2023 Shared Task: Unravelling Regional Nuances through Dialect Identification using Transformer-based Approach

no code yet • 30 Nov 2023

We fine-tune these state-of-the-art models on the provided dataset.

NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task

no code yet • 24 Oct 2023

We describe the findings of the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023).

Yet Another Model for Arabic Dialect Identification

no code yet • 20 Oct 2023

We explore two architectural variations: ResNet and ECAPA-TDNN, coupled with two types of acoustic features: MFCCs and features exratected from the pre-trained self-supervised model UniSpeech-SAT Large, as well as a fusion of all four variants.

VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System

no code yet • 17 Oct 2023

We train a wide range of models such as HuBERT (DID), Whisper, and XLS-R (ASR) in a supervised setting for Arabic DID and ASR tasks.

Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition

no code yet • 17 Oct 2023

In this study, embeddings from advanced pre-trained language identification (LID) and speaker identification (SID) models are leveraged to improve the accuracy of accent classification and non-native accentedness assessment.

Unsupervised Out-of-Distribution Dialect Detection with Mahalanobis Distance

no code yet • 9 Aug 2023

Dialect classification is used in a variety of applications, such as machine translation and speech recognition, to improve the overall performance of the system.

Towards spoken dialect identification of Irish

no code yet • 14 Jul 2023

Motivated by this, the present experiments investigate spoken dialect identification of Irish, with a view to incorporating such a system into the speech recognition pipeline.

On the Robustness of Arabic Speech Dialect Identification

no code yet • 1 Jun 2023

As these pipelines require application of ADI tools to potentially out-of-domain data, we aim to investigate how vulnerable the tools may be to this domain shift.