Search Results for author: Seid Muhie Yimam

Found 39 papers, 14 papers with code

A Report on the Complex Word Identification Shared Task 2018

no code implementations • WS 2018 • Seid Muhie Yimam, Chris Biemann, Shervin Malmasi, Gustavo H. Paetzold, Lucia Specia, Sanja Štajner, Anaïs Tack, Marcos Zampieri

We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop co-located with NAACL-HLT'2018.

Binary Classification Classification +2

Paper
Add Code

Par4Sim -- Adaptive Paraphrasing for Text Simplification

no code implementations • COLING 2018 • Seid Muhie Yimam, Chris Biemann

Learning from a real-world data stream and continuously updating the model without explicit supervision is a new challenge for NLP applications with machine learning components.

Learning-To-Rank Text Simplification

Paper
Add Code

New/s/leak 2.0 - Multilingual Information Extraction and Visualization for Investigative Journalism

no code implementations • 13 Jul 2018 • Gregor Wiedemann, Seid Muhie Yimam, Chris Biemann

Investigative journalism in recent years is confronted with two major challenges: 1) vast amounts of unstructured data originating from large text collections such as leaks or answers to Freedom of Information requests, and 2) multi-lingual data due to intensified global cooperation and communication in politics, business and civil society.

Efficient Exploration

Paper
Add Code

Demonstrating PAR4SEM - A Semantic Writing Aid with Adaptive Paraphrasing

no code implementations • EMNLP 2018 • Seid Muhie Yimam, Chris Biemann

In this paper, we present Par4Sem, a semantic writing aid tool based on adaptive paraphrasing.

BIG-bench Machine Learning Text Simplification

Paper
Add Code

A Multilingual Information Extraction Pipeline for Investigative Journalism

no code implementations • EMNLP 2018 • Gregor Wiedemann, Seid Muhie Yimam, Chris Biemann

We introduce an advanced information extraction pipeline to automatically process very large collections of unstructured textual data for the purpose of investigative journalism.

Entity Extraction using GAN

Paper
Add Code

new/s/leak -- Information Extraction and Visualization for Investigative Data Journalists

no code implementations • ACL 2016 • Seid Muhie Yimam, Heiner Ulrich, von L, Tatiana esberger, Marcel Rosenbach, Michaela Regneri, Alex Panchenko, er, Franziska Lehmann, Uli Fahrer, Chris Biemann, Kathrin Ballweg

Paper
Add Code

Entity-Centric Information Access with Human in the Loop for the Biomedical Domain

no code implementations • RANLP 2017 • Seid Muhie Yimam, Steffen Remus, Alex Panchenko, er, Andreas Holzinger, Chris Biemann

In this paper, we describe the concept of entity-centric information access for the biomedical domain.

Management

Paper
Add Code

Learning Paraphrasing for Multiword Expressions

no code implementations • WS 2016 • Seid Muhie Yimam, H{\'e}ctor Mart{\'\i}nez Alonso, Martin Riedl, Chris Biemann

Learning-To-Rank Machine Translation +3

Paper
Add Code

A Web-based Tool for the Integrated Annotation of Semantic and Syntactic Structures

no code implementations • WS 2016 • Richard Eckart de Castilho, {\'E}va M{\'u}jdricza-Maydt, Seid Muhie Yimam, Silvana Hartmann, Iryna Gurevych, Anette Frank, Chris Biemann

We introduce the third major release of WebAnno, a generic web-based annotation tool for distributed teams.

Relation Extraction Slot Filling

Paper
Add Code

CWIG3G2 - Complex Word Identification Task across Three Text Genres and Two User Groups

no code implementations • IJCNLP 2017 • Seid Muhie Yimam, Sanja {\v{S}}tajner, Martin Riedl, Chris Biemann

Complex word identification (CWI) is an important task in text accessibility.

Complex Word Identification Lexical Simplification +1

Paper
Add Code

Multilingual and Cross-Lingual Complex Word Identification

no code implementations • RANLP 2017 • Seid Muhie Yimam, Sanja {\v{S}}tajner, Martin Riedl, Chris Biemann

Complex Word Identification (CWI) is an important task in lexical simplification and text accessibility.

Complex Word Identification Lexical Simplification

Paper
Add Code

Automatic Annotation Suggestions and Custom Annotation Layers in WebAnno

no code implementations • ACL 2014 • Seid Muhie Yimam, Chris Biemann, Richard Eckart de Castilho, Iryna Gurevych

Active Learning

Paper
Add Code

WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations

no code implementations • ACL 2013 • Seid Muhie Yimam, Iryna Gurevych, Richard Eckart de Castilho, Chris Biemann

Dependency Parsing

Paper
Add Code

Narrowing the Loop: Integration of Resources and Linguistic Dataset Development with Interactive Machine Learning

no code implementations • NAACL 2015 • Seid Muhie Yimam

Active Learning BIG-bench Machine Learning +5

Paper
Add Code

Analysis of the Ethiopic Twitter Dataset for Abusive Speech in Amharic

no code implementations • 9 Dec 2019 • Seid Muhie Yimam, Abinew Ali Ayele, Chris Biemann

Since several languages can be written using the Fidel script, we have used the existing Amharic, Tigrinya and Ge'ez corpora to retain only the Amharic tweets.

Paper
Add Code

UHH-LT at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection

no code implementations • SEMEVAL 2020 • Gregor Wiedemann, Seid Muhie Yimam, Chris Biemann

Fine-tuning of pre-trained transformer networks such as BERT yield state-of-the-art results for text classification tasks.

Domain Adaptation General Classification +4

Paper
Add Code

Word Complexity is in the Eye of the Beholder

no code implementations • NAACL 2021 • Sian Gooding, Ekaterina Kochmar, Seid Muhie Yimam, Chris Biemann

Lexical complexity is a highly subjective notion, yet this factor is often neglected in lexical simplification and readability systems which use a {''}one-size-fits-all{''} approach.

Lexical Simplification

Paper
Add Code

ActiveAnno: General-Purpose Document-Level Annotation Tool with Active Learning Integration

no code implementations • NAACL 2021 • Max Wiechmann, Seid Muhie Yimam, Chris Biemann

ActiveAnno is built with extensible design and easy deployment in mind, all to enable users to perform annotation tasks with high efficiency and high-quality annotation results.

Active Learning

Paper
Add Code

Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models

no code implementations • COLING 2020 • Seid Muhie Yimam, Hizkiel Mitiku Alemayehu, Abinew Ayele, Chris Biemann

To advance the sentiment analysis research in Amharic and other related low-resource languages, we release the dataset, the annotation tool, source code, and models publicly under a permissive.

Decision Making Sentiment Analysis +1

Paper
Add Code

SCoT: Sense Clustering over Time: a tool for the analysis of lexical change

no code implementations • EACL 2021 • Christian Haase, Saba Anwar, Seid Muhie Yimam, Alexander Friedrich, Chris Biemann

There are two main approaches to the exploration of dynamic networks: the discrete one compares a series of clustered graphs from separate points in time.

Clustering

Paper
Add Code

More Like This: Semantic Retrieval with Linguistic Information

no code implementations • KONVENS (WS) 2022 • Steffen Remus, Gregor Wiedemann, Saba Anwar, Fynn Petersen-Frey, Seid Muhie Yimam, Chris Biemann

Retrieval Semantic Retrieval

Paper
Add Code

Elvis vs. M. Jackson: Who has More Albums? Classification and Identification of Elements in Comparative Questions

no code implementations • LREC 2022 • Meriem Beloucif, Seid Muhie Yimam, Steffen Stahlhacke, Chris Biemann

Comparative Question Answering (cQA) is the task of providing concrete and accurate responses to queries such as: “Is Lyft cheaper than a regular taxi?” or “What makes a mortgage different from a regular loan?”.

Binary Classification Question Answering

Paper
Add Code

ARDIAS: AI-Enhanced Research Management, Discovery, and Advisory System

no code implementations • 25 Jan 2023 • Debayan Banerjee, Seid Muhie Yimam, Sushil Awale, Chris Biemann

In this work, we present ARDIAS, a web-based application that aims to provide researchers with a full suite of discovery and collaboration tools.

Management

Paper
Add Code

Walia-LLM: Enhancing Amharic-LLaMA by Integrating Task-Specific and Generative Datasets

no code implementations • 12 Feb 2024 • Israel Abebe Azime, Atnafu Lambebo Tonja, Tadesse Destaw Belay, Mitiku Yohannes Fuge, Aman Kassahun Wassie, Eyasu Shiferaw Jada, Yonas Chanie, Walelign Tewabe Sewunetie, Seid Muhie Yimam

We compile an Amharic instruction fine-tuning dataset and fine-tuned LLaMA-2-Amharic model.

Language Modelling

Paper
Add Code

EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation

no code implementations • 20 Mar 2024 • Atnafu Lambebo Tonja, Israel Abebe Azime, Tadesse Destaw Belay, Mesay Gemeda Yigezu, Moges Ahmed Mehamed, Abinew Ali Ayele, Ebrahim Chekol Jibril, Michael Melese Woldeyohannis, Olga Kolesnikova, Philipp Slusallek, Dietrich Klakow, Shengwu Xiong, Seid Muhie Yimam

We open-source our multilingual language models, new benchmark datasets for various downstream tasks, and task-specific fine-tuned language models and discuss the performance of the models.

Paper
Add Code

IIT-UHH at SemEval-2017 Task 3: Exploring Multiple Features for Community Question Answering and Implicit Dialogue Identification

1 code implementation • SEMEVAL 2017 • N, Titas i, Chris Biemann, Seid Muhie Yimam, Deepak Gupta, Sarah Kohail, Asif Ekbal, Pushpak Bhattacharyya

In this paper we present the system for Answer Selection and Ranking in Community Question Answering, which we build as part of our participation in SemEval-2017 Task 3.

Answer Selection Community Question Answering +1

Paper
Code

Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities

1 code implementation • 25 Mar 2023 • Atnafu Lambebo Tonja, Tadesse Destaw Belay, Israel Abebe Azime, Abinew Ali Ayele, Moges Ahmed Mehamed, Olga Kolesnikova, Seid Muhie Yimam

This survey delves into the current state of natural language processing (NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta.

Paper
Code

Exploring Boundaries and Intensities in Offensive and Hate Speech: Unveiling the Complex Spectrum of Social Media Discourse

1 code implementation • 18 Apr 2024 • Abinew Ali Ayele, Esubalew Alemneh Jalew, Adem Chanie Ali, Seid Muhie Yimam, Chris Biemann

The prevalence of digital media and evolving sociopolitical dynamics have significantly amplified the dissemination of hateful content.

Binary Classification regression

Paper
Code

How Hateful are Movies? A Study and Prediction on Movie Subtitles

1 code implementation • KONVENS (WS) 2021 • Niklas von Boguszewski, Sana Moin, Anirban Bhowmick, Seid Muhie Yimam, Chris Biemann

Hence, we show that transfer learning from the social media domain is efficacious in classifying hate and offensive speech in movies through subtitles.

Domain Adaptation Transfer Learning

Paper
Code

Introducing various Semantic Models for Amharic: Experimentation and Evaluation with multiple Tasks and Datasets

1 code implementation • 2 Nov 2020 • Seid Muhie Yimam, Abinew Ali Ayele, Gopalakrishnan Venkatesh, Chris Biemann

We find that newly trained models perform better than pre-trained multilingual models.

Network Embedding

Paper
Code

Question Answering Classification for Amharic Social Media Community Based Questions

1 code implementation • SIGUL (LREC) 2022 • Tadesse Destaw, Seid Muhie Yimam, Abinew Ayele, Chris Biemann

Questions are posted in Amharic, English, or Amharic but in a Latin script.

8k Question Answering +1

Paper
Code

The Effect of Normalization for Bi-directional Amharic-English Neural Machine Translation

2 code implementations • 27 Oct 2022 • Tadesse Destaw Belay, Atnafu Lambebo Tonja, Olga Kolesnikova, Seid Muhie Yimam, Abinew Ali Ayele, Silesh Bogale Haile, Grigori Sidorov, Alexander Gelbukh

Machine translation (MT) is one of the main tasks in natural language processing whose objective is to translate texts automatically from one natural language to another.

Machine Translation Sentence +1

Paper
Code

Automatic Compilation of Resources for Academic Writing and Evaluating with Informal Word Identification and Paraphrasing System

1 code implementation • LREC 2020 • Seid Muhie Yimam, Gopalakrishnan Venkatesh, John Sie Yuen Lee, Chris Biemann

The aim is to build a writing aid system that automatically edits a text so that it better adheres to the academic style of writing.

Paraphrase Generation

Paper
Code

SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 14 Languages

2 code implementations • 13 Feb 2024 • Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Abinew Ali Ayele, Pavan Baswani, Meriem Beloucif, Chris Biemann, Sofia Bourhim, Christine de Kock, Genet Shanko Dekebo, Oumaima Hourrane, Gopichand Kanumolu, Lokesh Madasu, Samuel Rutunda, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Hailegnaw Getaneh Tilaye, Krishnapriya Vishnubhotla, Genta Winata, Seid Muhie Yimam, Saif M. Mohammad

Exploring and quantifying semantic relatedness is central to representing language.

Paper
Code

SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian Languages

1 code implementation • 27 Mar 2024 • Nedjma Ousidhoum, Shamsuddeen Hassan Muhammad, Mohamed Abdalla, Idris Abdulmumin, Ibrahim Said Ahmad, Sanchit Ahuja, Alham Fikri Aji, Vladimir Araujo, Meriem Beloucif, Christine de Kock, Oumaima Hourrane, Manish Shrivastava, Thamar Solorio, Nirmal Surange, Krishnapriya Vishnubhotla, Seid Muhie Yimam, Saif M. Mohammad

We present the first shared task on Semantic Textual Relatedness (STR).

Paper
Code

SemEval-2023 Task 12: Sentiment Analysis for African Languages (AfriSenti-SemEval)

1 code implementation • 13 Apr 2023 • Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Seid Muhie Yimam, David Ifeoluwa Adelani, Ibrahim Sa'id Ahmad, Nedjma Ousidhoum, Abinew Ayele, Saif M. Mohammad, Meriem Beloucif, Sebastian Ruder

We present the first Africentric SemEval Shared task, Sentiment Analysis for African Languages (AfriSenti-SemEval) - The dataset is available at https://github. com/afrisenti-semeval/afrisent-semeval-2023.

Classification Sentiment Analysis +2

Paper
Code

MasakhaNER: Named Entity Recognition for African Languages

2 code implementations • 22 Mar 2021 • David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi, Verrah Otiende, Iroro Orife, Davis David, Samba Ngom, Tosin Adewumi, Paul Rayson, Mofetoluwa Adeyemi, Gerald Muriuki, Emmanuel Anebi, Chiamaka Chukwuneke, Nkiruka Odu, Eric Peter Wairagala, Samuel Oyerinde, Clemencia Siro, Tobius Saul Bateesa, Temilola Oloyede, Yvonne Wambui, Victor Akinode, Deborah Nabagereka, Maurice Katusiime, Ayodele Awokoya, Mouhamadane MBOUP, Dibora Gebreyohannes, Henok Tilaye, Kelechi Nwaike, Degaga Wolde, Abdoulaye Faye, Blessing Sibanda, Orevaoghene Ahia, Bonaventure F. P. Dossou, Kelechi Ogueji, Thierno Ibrahima DIOP, Abdoulaye Diallo, Adewale Akinfaderin, Tendai Marengereke, Salomey Osei

We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders.

named-entity-recognition Named Entity Recognition +2