no code implementations • 29 Oct 2022 • Ebbie Awino, Lilian Wanzare, Lawrence Muchemi, Barack Wanjawa, Edward Ombui, Florence Indede, Owen McOnyango, Benard Okal
Building automatic speech recognition (ASR) systems is a challenging task, especially for under-resourced languages that need to construct corpora nearly from scratch and lack sufficient training data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 25 Aug 2022 • Barack Wanjawa, Lilian Wanzare, Florence Indede, Owen McOnyango, Edward Ombui, Lawrence Muchemi
The Kencorpus dataset is a text and speech corpus for three languages predominantly spoken in Kenya: Swahili, Dholuo and Luhya.
no code implementations • 4 May 2022 • Barack W. Wanjawa, Lilian D. A. Wanzare, Florence Indede, Owen McOnyango, Lawrence Muchemi, Edward Ombui
The need for Question Answering datasets in low resource languages is the motivation of this research, leading to the development of Kencorpus Swahili Question Answering Dataset, KenSwQuAD.