1 code implementation • SIGUL (LREC) 2022 • Tadesse Destaw, Seid Muhie Yimam, Abinew Ayele, Chris Biemann
Questions are posted in Amharic, English, or Amharic but in a Latin script.
no code implementations • LREC 2022 • Meriem Beloucif, Seid Muhie Yimam, Steffen Stahlhacke, Chris Biemann
Comparative Question Answering (cQA) is the task of providing concrete and accurate responses to queries such as: “Is Lyft cheaper than a regular taxi?” or “What makes a mortgage different from a regular loan?”.
1 code implementation • 14 Jan 2025 • Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, David Ifeoluwa Adelani, Ibrahim Said Ahmad, Saminu Mohammad Aliyu, Nelson Odhiambo Onyango, Lilian D. A. Wanzare, Samuel Rutunda, Lukman Jibril Aliyu, Esubalew Alemneh, Oumaima Hourrane, Hagos Tesfahun Gebremichael, Elyas Abdi Ismail, Meriem Beloucif, Ebrahim Chekol Jibril, Andiswa Bukula, Rooweither Mabuya, Salomey Osei, Abigail Oppong, Tadesse Destaw Belay, Tadesse Kebede Guge, Tesfa Tegegne Asfaw, Chiamaka Ijeoma Chukwuneke, Paul Röttger, Seid Muhie Yimam, Nedjma Ousidhoum
These limitations are mainly due to the lack of high-quality data in the local languages and the failure to include local communities in the collection, annotation, and moderation processes.
no code implementations • 17 Dec 2024 • Tadesse Destaw Belay, Israel Abebe Azime, Abinew Ali Ayele, Grigori Sidorov, Dietrich Klakow, Philipp Slusallek, Olga Kolesnikova, Seid Muhie Yimam
The results show that accurate multi-label emotion classification is still insufficient even for high-resource languages such as English, and there is a large gap between the performance of high-resource and low-resource languages.
1 code implementation • 16 Dec 2024 • Daryna Dementieva, Nikolay Babakov, Amit Ronen, Abinew Ali Ayele, Naquee Rizwan, Florian Schneider, Xintong Wang, Seid Muhie Yimam, Daniil Moskovskiy, Elisei Stakovskii, Eran Kaufman, Ashraf Elnagar, Animesh Mukherjee, Alexander Panchenko
Even with various regulations in place across countries and social media platforms (Government of India, 2021; European Parliament and Council of the European Union, 2022, digital abusive speech remains a significant issue.
no code implementations • 7 Nov 2024 • Rohan Kumar Yadav, Bimal Bhattarai, Abhik Jana, Lei Jiao, Seid Muhie Yimam
Designing an explainable model becomes crucial now for Natural Language Processing(NLP) since most of the state-of-the-art machine learning models provide a limited explanation for the prediction.
no code implementations • 7 Aug 2024 • Samuel Minale Gashe, Seid Muhie Yimam, Yaregal Assabie
To address this gap, we develop Amharic hate speech data and SBi-LSTM deep learning model that can detect and classify text into four categories of hate speech: racial, religious, gender, and non-hate speech.
no code implementations • 27 Jun 2024 • Seid Muhie Yimam, Daryna Dementieva, Tim Fischer, Daniil Moskovskiy, Naquee Rizwan, Punyajoy Saha, Sarthak Roy, Martin Semmann, Alexander Panchenko, Chris Biemann, Animesh Mukherjee
Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge.
1 code implementation • 14 Jun 2024 • Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Afina Putri, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, Carla Perez-Almendros, Abinew Ali Ayele, Víctor Gutiérrez-Basulto, Yazmín Ibáñez-García, Hwaran Lee, Shamsuddeen Hassan Muhammad, Kiwoong Park, Anar Sabuhi Rzayev, Nina White, Seid Muhie Yimam, Mohammad Taher Pilehvar, Nedjma Ousidhoum, Jose Camacho-Collados, Alice Oh
To address this issue, we introduce BLEnD, a hand-crafted benchmark designed to evaluate LLMs' everyday knowledge across diverse cultures and languages.
1 code implementation • 18 Apr 2024 • Abinew Ali Ayele, Esubalew Alemneh Jalew, Adem Chanie Ali, Seid Muhie Yimam, Chris Biemann
The prevalence of digital media and evolving sociopolitical dynamics have significantly amplified the dissemination of hateful content.
1 code implementation • 27 Mar 2024 • Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Meriem Beloucif, Christine de Kock, Oumaima Hourrane, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Krishnapriya Vishnubhotla, Seid Muhie Yimam, Saif M. Mohammad
We present the first shared task on Semantic Textual Relatedness (STR).
no code implementations • 20 Mar 2024 • Atnafu Lambebo Tonja, Israel Abebe Azime, Tadesse Destaw Belay, Mesay Gemeda Yigezu, Moges Ahmed Mehamed, Abinew Ali Ayele, Ebrahim Chekol Jibril, Michael Melese Woldeyohannis, Olga Kolesnikova, Philipp Slusallek, Dietrich Klakow, Shengwu Xiong, Seid Muhie Yimam
We open-source our multilingual language models, new benchmark datasets for various downstream tasks, and task-specific fine-tuned language models and discuss the performance of the models.
2 code implementations • 13 Feb 2024 • Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Abinew Ali Ayele, Pavan Baswani, Meriem Beloucif, Chris Biemann, Sofia Bourhim, Christine de Kock, Genet Shanko Dekebo, Oumaima Hourrane, Gopichand Kanumolu, Lokesh Madasu, Samuel Rutunda, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Hailegnaw Getaneh Tilaye, Krishnapriya Vishnubhotla, Genta Winata, Seid Muhie Yimam, Saif M. Mohammad
Exploring and quantifying semantic relatedness is central to representing language and holds significant implications across various NLP tasks.
no code implementations • 12 Feb 2024 • Israel Abebe Azime, Atnafu Lambebo Tonja, Tadesse Destaw Belay, Mitiku Yohannes Fuge, Aman Kassahun Wassie, Eyasu Shiferaw Jada, Yonas Chanie, Walelign Tewabe Sewunetie, Seid Muhie Yimam
We compile an Amharic instruction fine-tuning dataset and fine-tuned LLaMA-2-Amharic model.
1 code implementation • 13 Apr 2023 • Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Seid Muhie Yimam, David Ifeoluwa Adelani, Ibrahim Sa'id Ahmad, Nedjma Ousidhoum, Abinew Ayele, Saif M. Mohammad, Meriem Beloucif, Sebastian Ruder
We present the first Africentric SemEval Shared task, Sentiment Analysis for African Languages (AfriSenti-SemEval) - The dataset is available at https://github. com/afrisenti-semeval/afrisent-semeval-2023.
1 code implementation • 25 Mar 2023 • Atnafu Lambebo Tonja, Tadesse Destaw Belay, Israel Abebe Azime, Abinew Ali Ayele, Moges Ahmed Mehamed, Olga Kolesnikova, Seid Muhie Yimam
This survey delves into the current state of natural language processing (NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta.
3 code implementations • 17 Feb 2023 • Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, Nedjma Ousidhoum, David Ifeoluwa Adelani, Seid Muhie Yimam, Ibrahim Sa'id Ahmad, Meriem Beloucif, Saif M. Mohammad, Sebastian Ruder, Oumaima Hourrane, Pavel Brazdil, Felermino Dário Mário António Ali, Davis David, Salomey Osei, Bello Shehu Bello, Falalu Ibrahim, Tajuddeen Gwadabe, Samuel Rutunda, Tadesse Belay, Wendimu Baye Messelle, Hailu Beshada Balcha, Sisay Adugna Chala, Hagos Tesfahun Gebremichael, Bernard Opoku, Steven Arthur
These include 75 languages with at least one million speakers each.
no code implementations • 25 Jan 2023 • Debayan Banerjee, Seid Muhie Yimam, Sushil Awale, Chris Biemann
In this work, we present ARDIAS, a web-based application that aims to provide researchers with a full suite of discovery and collaboration tools.
2 code implementations • 27 Oct 2022 • Tadesse Destaw Belay, Atnafu Lambebo Tonja, Olga Kolesnikova, Seid Muhie Yimam, Abinew Ali Ayele, Silesh Bogale Haile, Grigori Sidorov, Alexander Gelbukh
Machine translation (MT) is one of the main tasks in natural language processing whose objective is to translate texts automatically from one natural language to another.
no code implementations • EACL 2021 • Christian Haase, Saba Anwar, Seid Muhie Yimam, Alexander Friedrich, Chris Biemann
There are two main approaches to the exploration of dynamic networks: the discrete one compares a series of clustered graphs from separate points in time.
1 code implementation • KONVENS (WS) 2021 • Niklas von Boguszewski, Sana Moin, Anirban Bhowmick, Seid Muhie Yimam, Chris Biemann
Hence, we show that transfer learning from the social media domain is efficacious in classifying hate and offensive speech in movies through subtitles.
no code implementations • NAACL 2021 • Max Wiechmann, Seid Muhie Yimam, Chris Biemann
ActiveAnno is built with extensible design and easy deployment in mind, all to enable users to perform annotation tasks with high efficiency and high-quality annotation results.
no code implementations • NAACL 2021 • Sian Gooding, Ekaterina Kochmar, Seid Muhie Yimam, Chris Biemann
Lexical complexity is a highly subjective notion, yet this factor is often neglected in lexical simplification and readability systems which use a {''}one-size-fits-all{''} approach.
2 code implementations • 22 Mar 2021 • David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi, Verrah Otiende, Iroro Orife, Davis David, Samba Ngom, Tosin Adewumi, Paul Rayson, Mofetoluwa Adeyemi, Gerald Muriuki, Emmanuel Anebi, Chiamaka Chukwuneke, Nkiruka Odu, Eric Peter Wairagala, Samuel Oyerinde, Clemencia Siro, Tobius Saul Bateesa, Temilola Oloyede, Yvonne Wambui, Victor Akinode, Deborah Nabagereka, Maurice Katusiime, Ayodele Awokoya, Mouhamadane MBOUP, Dibora Gebreyohannes, Henok Tilaye, Kelechi Nwaike, Degaga Wolde, Abdoulaye Faye, Blessing Sibanda, Orevaoghene Ahia, Bonaventure F. P. Dossou, Kelechi Ogueji, Thierno Ibrahima DIOP, Abdoulaye Diallo, Adewale Akinfaderin, Tendai Marengereke, Salomey Osei
We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders.
6 code implementations • 18 Dec 2020 • Binny Mathew, Punyajoy Saha, Seid Muhie Yimam, Chris Biemann, Pawan Goyal, Animesh Mukherjee
We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.
Ranked #3 on Hate Speech Detection on HateXplain
no code implementations • COLING 2020 • Seid Muhie Yimam, Hizkiel Mitiku Alemayehu, Abinew Ayele, Chris Biemann
To advance the sentiment analysis research in Amharic and other related low-resource languages, we release the dataset, the annotation tool, source code, and models publicly under a permissive.
1 code implementation • 2 Nov 2020 • Seid Muhie Yimam, Abinew Ali Ayele, Gopalakrishnan Venkatesh, Chris Biemann
We find that newly trained models perform better than pre-trained multilingual models.
no code implementations • SEMEVAL 2020 • Gregor Wiedemann, Seid Muhie Yimam, Chris Biemann
Fine-tuning of pre-trained transformer networks such as BERT yield state-of-the-art results for text classification tasks.
1 code implementation • LREC 2020 • Seid Muhie Yimam, Gopalakrishnan Venkatesh, John Sie Yuen Lee, Chris Biemann
The aim is to build a writing aid system that automatically edits a text so that it better adheres to the academic style of writing.
no code implementations • 9 Dec 2019 • Seid Muhie Yimam, Abinew Ali Ayele, Chris Biemann
Since several languages can be written using the Fidel script, we have used the existing Amharic, Tigrinya and Ge'ez corpora to retain only the Amharic tweets.
no code implementations • EMNLP 2018 • Gregor Wiedemann, Seid Muhie Yimam, Chris Biemann
We introduce an advanced information extraction pipeline to automatically process very large collections of unstructured textual data for the purpose of investigative journalism.
no code implementations • EMNLP 2018 • Seid Muhie Yimam, Chris Biemann
In this paper, we present Par4Sem, a semantic writing aid tool based on adaptive paraphrasing.
no code implementations • 13 Jul 2018 • Gregor Wiedemann, Seid Muhie Yimam, Chris Biemann
Investigative journalism in recent years is confronted with two major challenges: 1) vast amounts of unstructured data originating from large text collections such as leaks or answers to Freedom of Information requests, and 2) multi-lingual data due to intensified global cooperation and communication in politics, business and civil society.
no code implementations • COLING 2018 • Seid Muhie Yimam, Chris Biemann
Learning from a real-world data stream and continuously updating the model without explicit supervision is a new challenge for NLP applications with machine learning components.
no code implementations • WS 2018 • Seid Muhie Yimam, Chris Biemann, Shervin Malmasi, Gustavo H. Paetzold, Lucia Specia, Sanja Štajner, Anaïs Tack, Marcos Zampieri
We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop co-located with NAACL-HLT'2018.
no code implementations • IJCNLP 2017 • Seid Muhie Yimam, Sanja {\v{S}}tajner, Martin Riedl, Chris Biemann
Complex word identification (CWI) is an important task in text accessibility.
no code implementations • RANLP 2017 • Seid Muhie Yimam, Sanja {\v{S}}tajner, Martin Riedl, Chris Biemann
Complex Word Identification (CWI) is an important task in lexical simplification and text accessibility.
no code implementations • RANLP 2017 • Seid Muhie Yimam, Steffen Remus, Alex Panchenko, er, Andreas Holzinger, Chris Biemann
In this paper, we describe the concept of entity-centric information access for the biomedical domain.
1 code implementation • SEMEVAL 2017 • N, Titas i, Chris Biemann, Seid Muhie Yimam, Deepak Gupta, Sarah Kohail, Asif Ekbal, Pushpak Bhattacharyya
In this paper we present the system for Answer Selection and Ranking in Community Question Answering, which we build as part of our participation in SemEval-2017 Task 3.
no code implementations • WS 2016 • Richard Eckart de Castilho, {\'E}va M{\'u}jdricza-Maydt, Seid Muhie Yimam, Silvana Hartmann, Iryna Gurevych, Anette Frank, Chris Biemann
We introduce the third major release of WebAnno, a generic web-based annotation tool for distributed teams.
no code implementations • ACL 2016 • Seid Muhie Yimam, Heiner Ulrich, von L, Tatiana esberger, Marcel Rosenbach, Michaela Regneri, Alex Panchenko, er, Franziska Lehmann, Uli Fahrer, Chris Biemann, Kathrin Ballweg