First Automatic Fongbe Continuous Speech Recognition System: Development of Acoustic Models and Language Models
This paper reports our efforts toward an ASR system for a new under-resourced language (Fongbe). The aim of this work is to build acoustic models and language models for continuous speech decoding in Fongbe. The problem encountered with Fongbe (an African language spoken especially in Benin, Togo, and Nigeria) is that it does not have any language resources for an ASR system. As part of this work, we have first collected Fongbe text and speech corpora that are described in the following sections. Acoustic modeling has been worked out at a graphemic level and language modeling has provided two language models for performance comparison purposes. We also performed a vowel simplification by removing tones diacritics in order to investigate their impact on the language models.
PDFDatasets
Introduced in the Paper:
Fongbe audioTask | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Speech Recognition | Fongbe audio | Triphone (39 features) + LDA and MLLT + SGMM | Word Error Rate (WER) | 16.57 | # 1 | |
Speech Recognition | Fongbe audio | Triphone (13 MFCC + delta + delta2) | Word Error Rate (WER) | 26.75 | # 3 | |
Speech Recognition | Fongbe audio | Triphone (39 features) + LDA and MLLT + SAT and FMLLR | Word Error Rate (WER) | 17.77 | # 2 |