no code implementations • DCLRL (LREC) 2022 • Salomon Kabongo Kabenamualu, Vukosi Marivate, Herman Kamper
In recent years there has been great interest in addressing the data scarcity of African languages and providing baseline models for different Natural Language Processing tasks (Orife et al., 2020).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • EACL (GWC) 2021 • Tshephisho Joseph Sefara, Tumisho Billson Mokgonyane, Vukosi Marivate
This paper proposes the implementation of WordNets for five South African languages, namely, Sepedi, Setswana, Tshivenda, isiZulu and isiXhosa to be added to open multilingual WordNets (OMW) on natural language toolkit (NLTK).
2 code implementations • 7 Mar 2023 • Richard Lastrucci, Isheanesu Dzingirai, Jenalea Rajab, Andani Madodonga, Matimba Shingange, Daniel Njini, Vukosi Marivate
This paper introduces two multilingual government themed corpora in various South African languages.
no code implementations • 13 Nov 2022 • Nicolle Garber, Vukosi Marivate
The subject of conversational mining has become of great interest recently due to the explosion of social and other online media.
no code implementations • 1 Nov 2022 • Herkulaas Combrink, Vukosi Marivate, Benjamin Rosman
Advances in reinforcement learning research have demonstrated the ways in which different agent-based models can learn how to optimally perform a task within a given environment.
no code implementations • 22 Oct 2022 • David Ifeoluwa Adelani, Graham Neubig, Sebastian Ruder, Shruti Rijhwani, Michael Beukman, Chester Palen-Michel, Constantine Lignos, Jesujoba O. Alabi, Shamsuddeen H. Muhammad, Peter Nabende, Cheikh M. Bamba Dione, Andiswa Bukula, Rooweither Mabuya, Bonaventure F. P. Dossou, Blessing Sibanda, Happy Buzaaba, Jonathan Mukiibi, Godson Kalipe, Derguene Mbaye, Amelia Taylor, Fatoumata Kabore, Chris Chinenye Emezue, Anuoluwapo Aremu, Perez Ogayo, Catherine Gitau, Edwin Munkoh-Buabeng, Victoire M. Koagne, Allahsera Auguste Tapo, Tebogo Macucwa, Vukosi Marivate, Elvis Mboning, Tajuddeen Gwadabe, Tosin Adewumi, Orevaoghene Ahia, Joyce Nakatumba-Nabende, Neo L. Mokono, Ignatius Ezeani, Chiamaka Chukwuneke, Mofetoluwa Adeyemi, Gilles Q. Hacheme, Idris Abdulmumin, Odunayo Ogundepo, Oreen Yousuf, Tatiana Moteu Ngoli, Dietrich Klakow
African languages are spoken by over a billion people, but are underrepresented in NLP research and development.
no code implementations • 16 Oct 2022 • Herkulaas MvE Combrink, Vukosi Marivate, Benjamin Rosman
While much effort and detail has gone into the expansion of explaining algorithmic decision making in this context, there is still a need to develop data collection strategies Therefore, the purpose of this paper is to outline a data collection framework specific to recommender systems within this context in order to reduce collection biases, understand student characteristics, and find an ideal way to infer optimal influences on the student journey.
no code implementations • 16 Oct 2022 • Herkulaas MvE Combrink, Vukosi Marivate, Benjamin Rosman
The ability to generate synthetic data has a variety of use cases across different domains.
no code implementations • 4 May 2022 • Mashadi Ledwaba, Vukosi Marivate
This study aims to understand the South African political context by analysing the sentiments shared on Twitter during the local government elections.
2 code implementations • 6 Dec 2021 • Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, Michael A. Yee, Jing Zhang, Yue Zhang
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.
1 code implementation • 11 Nov 2021 • Mack Makgatho, Vukosi Marivate, Tshephisho Sefara, Valencia Wagner
This paper trains Setswana and Sepedi monolingual word vectors and uses VecMap to create cross-lingual embeddings for Setswana-Sepedi in order to do a cross-lingual transfer.
no code implementations • 10 Aug 2021 • David Behr, Ciira wa Maina, Vukosi Marivate
This paper is an investigation into aspects of an audio classification pipeline that will be appropriate for the monitoring of bird species on edges devices.
1 code implementation • 6 Aug 2021 • Harm de Wet, Vukosi Marivate
In this work we investigate fake news detection on South African websites.
no code implementations • 18 Jun 2021 • Michelle Terblanche, Vukosi Marivate
Sentiment analysis as a sub-field of natural language processing has received increased attention in the past decade enabling organisations to more effectively manage their reputation through online media monitoring.
1 code implementation • 6 Apr 2021 • Kathleen Siminyu, Godson Kalipe, Davor Orlic, Jade Abbott, Vukosi Marivate, Sackey Freshia, Prateek Sibal, Bhanu Neupane, David I. Adelani, Amelia Taylor, Jamiil Toure Ali, Kevin Degila, Momboladji Balogoun, Thierno Ibrahima DIOP, Davis David, Chayma Fourati, Hatem Haddad, Malek Naski
Advances in speech and language technologies enable tools such as voice-search, text-to-speech, speech recognition and machine translation.
no code implementations • 20 Nov 2020 • Kathleen Siminyu, Laura Martinus, Vukosi Marivate
Proceedings of the 1st AfricaNLP Workshop held on 26th April alongside ICLR 2020, Virtual Conference, Formerly Addis Ababa Ethiopia.
4 code implementations • Findings of the Association for Computational Linguistics 2020 • Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Tajudeen Kolawole, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Hassan Muhammad, Salomon Kabongo, Salomey Osei, Sackey Freshia, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa, Mofe Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Jane Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer, Jason Webster, Jamiil Toure Ali, Jade Abbott, Iroro Orife, Ignatius Ezeani, Idris Abdulkabir Dangana, Herman Kamper, Hady Elsahar, Goodness Duru, Ghollah Kioko, Espoir Murhabazi, Elan van Biljon, Daniel Whitenack, Christopher Onyefuluchi, Chris Emezue, Bonaventure Dossou, Blessing Sibanda, Blessing Itoro Bassey, Ayodele Olabiyi, Arshath Ramkilowan, Alp Öktem, Adewale Akinfaderin, Abdallah Bashir
Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved.
no code implementations • 23 Jul 2020 • Kathleen Siminyu, Sackey Freshia, Jade Abbott, Vukosi Marivate
This work details the organisation of the AI4D - African Language Dataset Challenge, an effort to incentivize the creation, organization and discovery of African language datasets through a competitive challenge.
1 code implementation • 26 Jun 2020 • Nompumelelo Mtsweni, Herkulaas MvE Combrink, Vukosi Marivate
When the COVID-19 disease pandemic infiltrated the world, there was an immediate need for accurate information.
Computers and Society
no code implementations • 11 Jun 2020 • Vukosi Marivate, Avashlin Moodley, Athandiwe Saba
Social Media can be used to extract discussion topics during a disaster.
no code implementations • 22 Apr 2020 • Henry Wandera, Vukosi Marivate, David Sengeh
Available or adequate information to inform decision making for resource allocation in support of school improvement is a critical issue globally.
1 code implementation • 2 Apr 2020 • Vukosi Marivate, Herkulaas MvE Combrink
These announcements narrate the confirmed COVID-19 cases and include the age, gender, and travel history of people who have tested positive for the disease.
Computers and Society Applications
no code implementations • 30 Mar 2020 • Vukosi Marivate, Tshephisho Sefara, Vongani Chabalala, Keamogetswe Makhaya, Tumisho Mokgonyane, Rethabile Mokoena, Abiodun Modupe
The recent advances in Natural Language Processing have only been a boon for well represented languages, negating research in lesser known global languages.
2 code implementations • 13 Mar 2020 • Iroro Orife, Julia Kreutzer, Blessing Sibanda, Daniel Whitenack, Kathleen Siminyu, Laura Martinus, Jamiil Toure Ali, Jade Abbott, Vukosi Marivate, Salomon Kabongo, Musie Meressa, Espoir Murhabazi, Orevaoghene Ahia, Elan van Biljon, Arshath Ramkilowan, Adewale Akinfaderin, Alp Öktem, Wole Akin, Ghollah Kioko, Kevin Degila, Herman Kamper, Bonaventure Dossou, Chris Emezue, Kelechi Ogueji, Abdallah Bashir
Africa has over 2000 languages.
no code implementations • LREC 2020 • Vukosi Marivate, Tshephisho Sefara, Vongani Chabalala, Keamogetswe Makhaya, Tumisho Mokgonyane, Rethabile Mokoena, Abiodun Modupe
The recent advances in Natural Language Processing have been a boon for well-represented languages in terms of available curated data and research resources.
1 code implementation • 7 Jul 2019 • Vukosi Marivate, Tshephisho Sefara
We study the effect of different approaches to text augmentation.