MediaSpeech: Multilanguage ASR Benchmark and Dataset

The performance of automated speech recognition (ASR) systems is well known to differ for varied application domains. At the same time, vendors and research groups typically report ASR quality results either for limited use simplistic domains (audiobooks, TED talks), or proprietary datasets. To fill this gap, we provide an open-source 10-hour ASR system evaluation dataset NTR MediaSpeech for 4 languages: Spanish, French, Turkish and Arabic. The dataset was collected from the official youtube channels of media in the respective languages, and manually transcribed. We estimate that the WER of the dataset is under 5%. We have benchmarked many ASR systems available both commercially and freely, and provide the benchmark results. We also open-source baseline QuartzNet models for each language.

PDF Abstract

Datasets


Introduced in the Paper:

MediaSpeech

Used in the Paper:

Common Voice

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Speech Recognition MediaSpeech Quartznet WER for Arabic 0.1300 # 1
WER for French 0.1915 # 3
WER for Turkish 0.1422 # 2
WER for Spanish 0.1826 # 3
Speech Recognition MediaSpeech wav2vec WER for Arabic 0.9596 # 6
WER for French 0.3113 # 6
WER for Turkish 0.5812 # 6
WER for Spanish 0.2469 # 6
Speech Recognition MediaSpeech Deepspeech WER for French 0.4741 # 7
WER for Spanish 0.4236 # 8
Speech Recognition MediaSpeech Wit WER for Arabic 0.2333 # 2
WER for French 0.1759 # 2
WER for Turkish 0.0768 # 1
WER for Spanish 0.0879 # 1
Speech Recognition MediaSpeech Silero WER for Spanish 0.3070 # 7
Speech Recognition MediaSpeech VOSK WER for Arabic 0.3085 # 4
WER for French 0.2111 # 4
WER for Turkish 0.3050 # 5
WER for Spanish 0.1970 # 4
Speech Recognition MediaSpeech Google WER for Arabic 0.4464 # 5
WER for French 0.2385 # 5
WER for Turkish 0.2707 # 4
WER for Spanish 0.2176 # 5
Speech Recognition MediaSpeech Azure WER for Arabic 0.3016 # 3
WER for French 0.1683 # 1
WER for Turkish 0.2296 # 3
WER for Spanish 0.1296 # 2

Methods


No methods listed for this paper. Add relevant methods here