TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Spoken language identification	Untranscribed mixed-speech dataset	SVM	ACC	45.2%	# 1
Spoken language identification	Untranscribed mixed-speech dataset	SVM	PRC	44.8%	# 1
Spoken language identification	Untranscribed mixed-speech dataset	SVM	RCL	45.4%	# 2
Spoken language identification	Untranscribed mixed-speech dataset	Max Ent	ACC	40%	# 3
Spoken language identification	Untranscribed mixed-speech dataset	Max Ent	PRC	40%	# 3
Spoken language identification	Untranscribed mixed-speech dataset	Max Ent	RCL	40.6%	# 4
Spoken language identification	Untranscribed mixed-speech dataset	Naive Bayes	ACC	37.9%	# 4
Spoken language identification	Untranscribed mixed-speech dataset	Naive Bayes	PRC	37.5%	# 4
Spoken language identification	Untranscribed mixed-speech dataset	Naive Bayes	RCL	50.2%	# 1
Spoken language identification	Untranscribed mixed-speech dataset	n-gram Language Model	ACC	40.4%	# 2
Spoken language identification	Untranscribed mixed-speech dataset	n-gram Language Model	PRC	40.2%	# 2
Spoken language identification	Untranscribed mixed-speech dataset	n-gram Language Model	RCL	41.3%	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/automatic-dialect-detection-in-arabic/spoken-language-identification-on-1)](https://paperswithcode.com/sota/spoken-language-identification-on-1?p=automatic-dialect-detection-in-arabic)`

Automatic Dialect Detection in Arabic Broadcast Speech

23 Sep 2015 · Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals ·

We investigate different approaches for dialect identification in Arabic broadcast speech, using phonetic, lexical features obtained from a speech recognition system, and acoustic features using the i-vector framework. We studied both generative and discriminate classifiers, and we combined these features using a multi-class Support Vector Machine (SVM). We validated our results on an Arabic/English language identification task, with an accuracy of 100%. We used these features in a binary classifier to discriminate between Modern Standard Arabic (MSA) and Dialectal Arabic, with an accuracy of 100%. We further report results using the proposed method to discriminate between the five most widely used dialects of Arabic: namely Egyptian, Gulf, Levantine, North African, and MSA, with an accuracy of 52%. We discuss dialect identification errors in the context of dialect code-switching between Dialectal Arabic and MSA, and compare the error pattern between manually labeled data, and the output from our classifier. We also release the train and test data as standard corpus for dialect identification.

PDF Abstract

Code

Add Remove Mark official

Qatar-Computing-Research-Institute/… official

Tasks

Add Remove

Dialect Identification

Language Identification

speech-recognition

Speech Recognition

Spoken language identification

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #1 on Spoken language identification on Untranscribed mixed-speech dataset

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Spoken language identification	Untranscribed mixed-speech dataset	SVM	ACC	45.2%	# 1	Compare
			PRC	44.8%	# 1	Compare
			RCL	45.4%	# 2	Compare
Spoken language identification	Untranscribed mixed-speech dataset	Max Ent	ACC	40%	# 3	Compare
			PRC	40%	# 3	Compare
			RCL	40.6%	# 4	Compare
Spoken language identification	Untranscribed mixed-speech dataset	Naive Bayes	ACC	37.9%	# 4	Compare
			PRC	37.5%	# 4	Compare
			RCL	50.2%	# 1	Compare
Spoken language identification	Untranscribed mixed-speech dataset	n-gram Language Model	ACC	40.4%	# 2	Compare
			PRC	40.2%	# 2	Compare
			RCL	41.3%	# 3	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Automatic Dialect Detection in Arabic Broadcast Speech

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove