Search Results for author: Lawrence Muchemi

Found 6 papers, 0 papers with code

Phonemic Representation and Transcription for Speech to Text Applications for Under-resourced Indigenous African Languages: The Case of Kiswahili

no code implementations • 29 Oct 2022 • Ebbie Awino, Lilian Wanzare, Lawrence Muchemi, Barack Wanjawa, Edward Ombui, Florence Indede, Owen McOnyango, Benard Okal

Building automatic speech recognition (ASR) systems is a challenging task, especially for under-resourced languages that need to construct corpora nearly from scratch and lack sufficient training data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Kencorpus: A Kenyan Language Corpus of Swahili, Dholuo and Luhya for Natural Language Processing Tasks

no code implementations • 25 Aug 2022 • Barack Wanjawa, Lilian Wanzare, Florence Indede, Owen McOnyango, Edward Ombui, Lawrence Muchemi

The Kencorpus dataset is a text and speech corpus for three languages predominantly spoken in Kenya: Swahili, Dholuo and Luhya.

Machine Translation Part-Of-Speech Tagging +3

Paper
Add Code

KenSwQuAD -- A Question Answering Dataset for Swahili Low Resource Language

no code implementations • 4 May 2022 • Barack W. Wanjawa, Lilian D. A. Wanzare, Florence Indede, Owen McOnyango, Lawrence Muchemi, Edward Ombui

The need for Question Answering datasets in low resource languages is the motivation of this research, leading to the development of Kencorpus Swahili Question Answering Dataset, KenSwQuAD.

BIG-bench Machine Learning Question Answering +1

Paper
Add Code

Best feature performance in codeswitched hate speech texts

no code implementations • 25 Sep 2019 • Edward Ombui, Lawrence Muchemi, Peter Wagacha

How well can hate speech concept be abstracted in order to inform automatic classification in codeswitched texts by machine learning classifiers?

Topic Models

Paper
Add Code

On the Place of Text Data in Lifelogs, and Text Analysis via Semantic Facets

no code implementations • 8 Jun 2016 • Gregory Grefenstette, Lawrence Muchemi

Data captured by lifelogging devices will increasingly include speech and text, potentially useful in analysis of intellectual activities.

Paper
Add Code

Determining the Characteristic Vocabulary for a Specialized Dictionary using Word2vec and a Directed Crawler

no code implementations • 31 May 2016 • Gregory Grefenstette, Lawrence Muchemi

Terms which are found significantly more often in the specialized corpus than in the background corpus are candidates for the characteristic vocabulary of the domain.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.