MCE: Mixed Cantonese and English Audio Dataset

27 Oct 2023  ·  Peng Xie, Zihao Xin, Yang Wang, Shengjun Huang, Tsz Wai Chan, Kani Chen ·

Recently Whisper has approached human-level robustness and accuracy in English speech recognition, while in minor language and mixed language speech recognition, there remains a compelling need for further improvement. In this work,we present the impressive results of Whisper-MCE, our fine-tuned Whisper, which was trainedusing our self-collected dataset, Mixed Cantoneseand English (MCE) audio dataset. Whisper-MCE achieved an impressive Mix Error Rate (MER) of 14.28%, which is 35.13% lower than the original model. It also achieved 12.61% Character Error Rate (CER) in Common voice zh-HK, positioning it as state-of-the-art. However, MER and CER pose challenges when it comes to evaluating its effectiveness in mixed-language and minor language contexts. We proposed a novel evaluation metric called FAL, which assesses an Automatic Speech Recognition (ASR) system based on fidelity to the original audio, accuracy, and latency. Whisper-MCE outperformed other models in this evaluation metric, achieving a score of 90.91 FAL, further highlighting its exceptional performance. The MCE dataset and code can be found at MCE.

PDF Abstract


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here