no code implementations • 29 Oct 2022 • Ebbie Awino, Lilian Wanzare, Lawrence Muchemi, Barack Wanjawa, Edward Ombui, Florence Indede, Owen McOnyango, Benard Okal
Building automatic speech recognition (ASR) systems is a challenging task, especially for under-resourced languages that need to construct corpora nearly from scratch and lack sufficient training data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 25 Aug 2022 • Barack Wanjawa, Lilian Wanzare, Florence Indede, Owen McOnyango, Edward Ombui, Lawrence Muchemi
The Kencorpus dataset is a text and speech corpus for three languages predominantly spoken in Kenya: Swahili, Dholuo and Luhya.
no code implementations • 4 May 2022 • Barack W. Wanjawa, Lilian D. A. Wanzare, Florence Indede, Owen McOnyango, Lawrence Muchemi, Edward Ombui
The need for Question Answering datasets in low resource languages is the motivation of this research, leading to the development of Kencorpus Swahili Question Answering Dataset, KenSwQuAD.
no code implementations • 25 Sep 2019 • Edward Ombui, Lawrence Muchemi, Peter Wagacha
How well can hate speech concept be abstracted in order to inform automatic classification in codeswitched texts by machine learning classifiers?
no code implementations • 8 Jun 2016 • Gregory Grefenstette, Lawrence Muchemi
Data captured by lifelogging devices will increasingly include speech and text, potentially useful in analysis of intellectual activities.
no code implementations • 31 May 2016 • Gregory Grefenstette, Lawrence Muchemi
Terms which are found significantly more often in the specialized corpus than in the background corpus are candidates for the characteristic vocabulary of the domain.