TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Recognition	MediaSpeech	Quartznet	WER for Arabic	0.1300	# 1
Speech Recognition	MediaSpeech	Quartznet	WER for French	0.1915	# 3
Speech Recognition	MediaSpeech	Quartznet	WER for Turkish	0.1422	# 2
Speech Recognition	MediaSpeech	Quartznet	WER for Spanish	0.1826	# 3
Speech Recognition	MediaSpeech	wav2vec	WER for Arabic	0.9596	# 6
Speech Recognition	MediaSpeech	wav2vec	WER for French	0.3113	# 6
Speech Recognition	MediaSpeech	wav2vec	WER for Turkish	0.5812	# 6
Speech Recognition	MediaSpeech	wav2vec	WER for Spanish	0.2469	# 6
Speech Recognition	MediaSpeech	Deepspeech	WER for French	0.4741	# 7
Speech Recognition	MediaSpeech	Deepspeech	WER for Spanish	0.4236	# 8
Speech Recognition	MediaSpeech	Wit	WER for Arabic	0.2333	# 2
Speech Recognition	MediaSpeech	Wit	WER for French	0.1759	# 2
Speech Recognition	MediaSpeech	Wit	WER for Turkish	0.0768	# 1
Speech Recognition	MediaSpeech	Wit	WER for Spanish	0.0879	# 1
Speech Recognition	MediaSpeech	Silero	WER for Spanish	0.3070	# 7
Speech Recognition	MediaSpeech	VOSK	WER for Arabic	0.3085	# 4
Speech Recognition	MediaSpeech	VOSK	WER for French	0.2111	# 4
Speech Recognition	MediaSpeech	VOSK	WER for Turkish	0.3050	# 5
Speech Recognition	MediaSpeech	VOSK	WER for Spanish	0.1970	# 4
Speech Recognition	MediaSpeech	Google	WER for Arabic	0.4464	# 5
Speech Recognition	MediaSpeech	Google	WER for French	0.2385	# 5
Speech Recognition	MediaSpeech	Google	WER for Turkish	0.2707	# 4
Speech Recognition	MediaSpeech	Google	WER for Spanish	0.2176	# 5
Speech Recognition	MediaSpeech	Azure	WER for Arabic	0.3016	# 3
Speech Recognition	MediaSpeech	Azure	WER for French	0.1683	# 1
Speech Recognition	MediaSpeech	Azure	WER for Turkish	0.2296	# 3
Speech Recognition	MediaSpeech	Azure	WER for Spanish	0.1296	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mediaspeech-multilanguage-asr-benchmark-and/speech-recognition-on-mediaspeech)](https://paperswithcode.com/sota/speech-recognition-on-mediaspeech?p=mediaspeech-multilanguage-asr-benchmark-and)`

MediaSpeech: Multilanguage ASR Benchmark and Dataset

30 Mar 2021 · Rostislav Kolobov, Olga Okhapkina, Olga Omelchishina, Andrey Platunov, Roman Bedyakin, Vyacheslav Moshkin, Dmitry Menshikov, Nikolay Mikhaylovskiy ·

The performance of automated speech recognition (ASR) systems is well known to differ for varied application domains. At the same time, vendors and research groups typically report ASR quality results either for limited use simplistic domains (audiobooks, TED talks), or proprietary datasets. To fill this gap, we provide an open-source 10-hour ASR system evaluation dataset NTR MediaSpeech for 4 languages: Spanish, French, Turkish and Arabic. The dataset was collected from the official youtube channels of media in the respective languages, and manually transcribed. We estimate that the WER of the dataset is under 5%. We have benchmarked many ASR systems available both commercially and freely, and provide the benchmark results. We also open-source baseline QuartzNet models for each language.

PDF Abstract

Code

Add Remove Mark official

NTRLab/MediaSpeech official

Tasks

Add Remove

speech-recognition

Speech Recognition

Datasets

Introduced in the Paper:

MediaSpeech

Used in the Paper:

Common Voice

Results from the Paper

Add Remove

Ranked #1 on Speech Recognition on MediaSpeech

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Recognition	MediaSpeech	Quartznet	WER for Arabic	0.1300	# 1	Compare
			WER for French	0.1915	# 3	Compare
			WER for Turkish	0.1422	# 2	Compare
			WER for Spanish	0.1826	# 3	Compare
Speech Recognition	MediaSpeech	wav2vec	WER for Arabic	0.9596	# 6	Compare
			WER for French	0.3113	# 6	Compare
			WER for Turkish	0.5812	# 6	Compare
			WER for Spanish	0.2469	# 6	Compare
Speech Recognition	MediaSpeech	Deepspeech	WER for French	0.4741	# 7	Compare
Speech Recognition	MediaSpeech	Deepspeech	WER for Spanish	0.4236	# 8	Compare
Speech Recognition	MediaSpeech	Wit	WER for Arabic	0.2333	# 2	Compare
			WER for French	0.1759	# 2	Compare
			WER for Turkish	0.0768	# 1	Compare
			WER for Spanish	0.0879	# 1	Compare
Speech Recognition	MediaSpeech	Silero	WER for Spanish	0.3070	# 7	Compare
Speech Recognition	MediaSpeech	VOSK	WER for Arabic	0.3085	# 4	Compare
			WER for French	0.2111	# 4	Compare
			WER for Turkish	0.3050	# 5	Compare
			WER for Spanish	0.1970	# 4	Compare
Speech Recognition	MediaSpeech	Google	WER for Arabic	0.4464	# 5	Compare
			WER for French	0.2385	# 5	Compare
			WER for Turkish	0.2707	# 4	Compare
			WER for Spanish	0.2176	# 5	Compare
Speech Recognition	MediaSpeech	Azure	WER for Arabic	0.3016	# 3	Compare
			WER for French	0.1683	# 1	Compare
			WER for Turkish	0.2296	# 3	Compare
			WER for Spanish	0.1296	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

MediaSpeech: Multilanguage ASR Benchmark and Dataset

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove