Search Results for author: Kelechi Ogueji

We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders.

named-entity-recognition Named Entity Recognition +2

Paper
Code

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

no code implementations • 22 Mar 2021 • Julia Kreutzer, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Ortiz Suarez, Iroro Orife, Kelechi Ogueji, Andre Niyongabo Rubungo, Toan Q. Nguyen, Mathias Müller, André Müller, Shamsuddeen Hassan Muhammad, Nanda Muhammad, Ayanda Mnyakeni, Jamshidbek Mirzakhalov, Tapiwanashe Matangira, Colin Leong, Nze Lawson, Sneha Kudugunta, Yacine Jernite, Mathias Jenny, Orhan Firat, Bonaventure F. P. Dossou, Sakhile Dlamini, Nisansa de Silva, Sakine Çabuk Ballı, Stella Biderman, Alessia Battisti, Ahmed Baruwa, Ankur Bapna, Pallavi Baljekar, Israel Abebe Azime, Ayodele Awokoya, Duygu Ataman, Orevaoghene Ahia, Oghenefego Ahia, Sweta Agrawal, Mofetoluwa Adeyemi

With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages.

Paper
Add Code

Towards Best Practices for Training Multilingual Dense Retrieval Models

no code implementations • 5 Apr 2022 • Xinyu Zhang, Kelechi Ogueji, Xueguang Ma, Jimmy Lin

Dense retrieval models using a transformer-based bi-encoder design have emerged as an active area of research.

Cross-Lingual Transfer Retrieval

Paper
Add Code

What a Creole Wants, What a Creole Needs

no code implementations • LREC 2022 • Heather Lent, Kelechi Ogueji, Miryam de Lhoneux, Orevaoghene Ahia, Anders Søgaard

We demonstrate, through conversations with Creole experts and surveys of Creole-speaking communities, how the things needed from language technology can change dramatically from one language to another, even when the languages are considered to be very similar to each other, as with Creoles.

Paper
Add Code

Intriguing Properties of Compression on Multilingual Models

no code implementations • 4 Nov 2022 • Kelechi Ogueji, Orevaoghene Ahia, Gbemileke Onilude, Sebastian Gehrmann, Sara Hooker, Julia Kreutzer

Multilingual models are often particularly dependent on scaling to generalize to a growing number of languages.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

How Good are Commercial Large Language Models on African Languages?

no code implementations • 11 May 2023 • Jessica Ojo, Kelechi Ogueji

We present a preliminary analysis of commercial large language models on two tasks (machine translation and text classification) across eight African languages, spanning different language families and geographical areas.

In-Context Learning Language Modelling +4

Paper
Add Code

How good are Large Language Models on African Languages?

no code implementations • 14 Nov 2023 • Jessica Ojo, Kelechi Ogueji, Pontus Stenetorp, David I. Adelani

Our results suggest that all LLMs produce below-par performance on African languages, and there is a large gap in performance compared to high-resource languages like English most tasks.

In-Context Learning Language Modelling +8

Paper
Add Code

Curry-DPO: Enhancing Alignment using Curriculum Learning & Ranked Preferences

no code implementations • 12 Mar 2024 • Pulkit Pattnaik, Rishabh Maheshwary, Kelechi Ogueji, Vikas Yadav, Sathwik Tejaswi Madhusudhan

With availability of such quality ratings for multiple responses, we propose utilizing these responses to create multiple preference pairs for a given prompt.

Paper
Add Code

Small Data? No Problem! Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages

1 code implementation • EMNLP (MRL) 2021 • Kelechi Ogueji, Yuxin Zhu, Jimmy Lin

In this work, we challenge this assumption and present the first attempt at training a multilingual language model on only low-resource languages.

Language Modelling named-entity-recognition +5

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.