Search Results for author: Constantine Lignos

Found 22 papers, 11 papers with code

If You Build Your Own NER Scorer, Non-replicable Results Will Come

no code implementations • EMNLP (insights) 2020 • Constantine Lignos, Marjan Kamyab

We propose best practices to increase the replicability of NER evaluations by increasing transparency regarding the handling of improper label sequences.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Effective Architectures for Low Resource Multilingual Named Entity Transliteration

no code implementations • loresmt (AACL) 2020 • Molly Moran, Constantine Lignos

In this paper, we evaluate LSTM, biLSTM, GRU, and Transformer architectures for the task of name transliteration in a many-to-one multilingual paradigm, transliterating from 590 languages to English.

Decoder Transliteration

Paper
Add Code

CoNLL#: Fine-grained Error Analysis and a Corrected Test Set for CoNLL-03 English

no code implementations • 20 May 2024 • Andrew Rueda, Elena Álvarez Mellado, Constantine Lignos

Modern named entity recognition systems have steadily improved performance in the age of larger and more powerful neural models.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

QueryNER: Segmentation of E-commerce Queries

1 code implementation • 15 May 2024 • Chester Palen-Michel, Lizzie Liang, Zhe Wu, Constantine Lignos

We present QueryNER, a manually-annotated dataset and accompanying model for e-commerce query segmentation.

Data Augmentation

Paper
Code

ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata

1 code implementation • 15 May 2024 • Jonne Sälevä, Constantine Lignos

Names are provided for 16. 8 million entities, and each entity is mapped from a complex type hierarchy to a standard type (PER/LOC/ORG).

Multilingual Named Entity Recognition named-entity-recognition +3

Paper
Code

What changes when you randomly choose BPE merge operations? Not much

no code implementations • 4 May 2023 • Jonne Sälevä, Constantine Lignos

We introduce three simple randomized variants of byte pair encoding (BPE) and explore whether randomizing the selection of merge operations substantially affects a downstream machine translation task.

Machine Translation Translation

Paper
Add Code

LR-Sum: Summarization for Less-Resourced Languages

no code implementations • 19 Dec 2022 • Chester Palen-Michel, Constantine Lignos

This preprint describes work in progress on LR-Sum, a new permissively-licensed dataset created with the goal of enabling further research in automatic summarization for less-resourced languages.

Paper
Add Code

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

1 code implementation • 22 Oct 2022 • David Ifeoluwa Adelani, Graham Neubig, Sebastian Ruder, Shruti Rijhwani, Michael Beukman, Chester Palen-Michel, Constantine Lignos, Jesujoba O. Alabi, Shamsuddeen H. Muhammad, Peter Nabende, Cheikh M. Bamba Dione, Andiswa Bukula, Rooweither Mabuya, Bonaventure F. P. Dossou, Blessing Sibanda, Happy Buzaaba, Jonathan Mukiibi, Godson Kalipe, Derguene Mbaye, Amelia Taylor, Fatoumata Kabore, Chris Chinenye Emezue, Anuoluwapo Aremu, Perez Ogayo, Catherine Gitau, Edwin Munkoh-Buabeng, Victoire M. Koagne, Allahsera Auguste Tapo, Tebogo Macucwa, Vukosi Marivate, Elvis Mboning, Tajuddeen Gwadabe, Tosin Adewumi, Orevaoghene Ahia, Joyce Nakatumba-Nabende, Neo L. Mokono, Ignatius Ezeani, Chiamaka Chukwuneke, Mofetoluwa Adeyemi, Gilles Q. Hacheme, Idris Abdulmumin, Odunayo Ogundepo, Oreen Yousuf, Tatiana Moteu Ngoli, Dietrich Klakow

African languages are spoken by over a billion people, but are underrepresented in NLP research and development.

Cross-Lingual Transfer named-entity-recognition +3

Paper
Code

Borrowing or Codeswitching? Annotating for Finer-Grained Distinctions in Language Mixing

1 code implementation • LREC 2022 • Elena Alvarez Mellado, Constantine Lignos

We present a new corpus of Twitter data annotated for codeswitching and borrowing between Spanish and English.

Paper
Code

Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling

1 code implementation • ACL 2022 • Elena Álvarez-Mellado, Constantine Lignos

This work presents a new resource for borrowing identification and analyzes the performance and errors of several models on this task.

Word Embeddings

Paper
Code

ParaNames: A Massively Multilingual Entity Name Corpus

1 code implementation • NAACL (SIGTYP) 2022 • Jonne Sälevä, Constantine Lignos

We demonstrate an application of ParaNames by training a multilingual model for canonical name translation to and from English.

named-entity-recognition Named Entity Recognition +3

Paper
Code

Toward More Meaningful Resources for Lower-resourced Languages

no code implementations • Findings (ACL) 2022 • Constantine Lignos, Nolan Holley, Chester Palen-Michel, Jonne Sälevä

We then discuss the importance of creating annotation for lower-resourced languages in a thoughtful and ethical way that includes the languages' speakers as part of the development process.

Position

Paper
Add Code

Multilingual Open Text Release 1: Public Domain News in 44 Languages

3 code implementations • LREC 2022 • Chester Palen-Michel, June Kim, Constantine Lignos

We present Multilingual Open Text (MOT), a new multilingual corpus containing text in 44 languages, many of which have limited existing text resources for natural language processing.

Paper
Code

Overview of ADoBo 2021: Automatic Detection of Unassimilated Borrowings in the Spanish Press

no code implementations • 29 Oct 2021 • Elena Álvarez Mellado, Luis Espinosa Anke, Julio Gonzalo Arroyo, Constantine Lignos, Jordi Porta Zamorano

This paper summarizes the main findings of the ADoBo 2021 shared task, proposed in the context of IberLef 2021.

Paper
Add Code

SeqScore: Addressing Barriers to Reproducible Named Entity Recognition Evaluation

1 code implementation • EMNLP (Eval4NLP) 2021 • Chester Palen-Michel, Nolan Holley, Constantine Lignos

To address a looming crisis of unreproducible evaluation for named entity recognition, we propose guidelines and introduce SeqScore, a software package to improve reproducibility.

named-entity-recognition Named Entity Recognition +1

Paper
Code

Macro-Average: Rare Types Are Important Too

1 code implementation • NAACL 2021 • Thamme Gowda, Weiqiu You, Constantine Lignos, Jonathan May

While traditional corpus-level evaluation metrics for machine translation (MT) correlate well with fluency, they struggle to reflect adequacy.

Cross-Lingual Information Retrieval Machine Translation +2

Paper
Code

Mining Wikidata for Name Resources for African Languages

1 code implementation • 1 Apr 2021 • Jonne Sälevä, Constantine Lignos

This work supports further development of language technology for the languages of Africa by providing a Wikidata-derived resource of name lists corresponding to common entity types (person, location, and organization).

Paper
Code

TMR: Evaluating NER Recall on Tough Mentions

no code implementations • EACL 2021 • Jingxuan Tu, Constantine Lignos

We propose the Tough Mentions Recall (TMR) metrics to supplement traditional named entity recognition (NER) evaluation by examining recall on specific subsets of "tough" mentions: unseen mentions, those whose tokens or token/type combination were not observed in training, and type-confusable mentions, token sequences with multiple entity types in the test data.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

MasakhaNER: Named Entity Recognition for African Languages

2 code implementations • 22 Mar 2021 • David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi, Verrah Otiende, Iroro Orife, Davis David, Samba Ngom, Tosin Adewumi, Paul Rayson, Mofetoluwa Adeyemi, Gerald Muriuki, Emmanuel Anebi, Chiamaka Chukwuneke, Nkiruka Odu, Eric Peter Wairagala, Samuel Oyerinde, Clemencia Siro, Tobius Saul Bateesa, Temilola Oloyede, Yvonne Wambui, Victor Akinode, Deborah Nabagereka, Maurice Katusiime, Ayodele Awokoya, Mouhamadane MBOUP, Dibora Gebreyohannes, Henok Tilaye, Kelechi Nwaike, Degaga Wolde, Abdoulaye Faye, Blessing Sibanda, Orevaoghene Ahia, Bonaventure F. P. Dossou, Kelechi Ogueji, Thierno Ibrahima DIOP, Abdoulaye Diallo, Adewale Akinfaderin, Tendai Marengereke, Salomey Osei

We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders.

named-entity-recognition Named Entity Recognition +2

Paper
Code

The Effectiveness of Morphology-aware Segmentation in Low-Resource Neural Machine Translation

no code implementations • EACL 2021 • Jonne Sälevä, Constantine Lignos

This paper evaluates the performance of several modern subword segmentation methods in a low-resource neural machine translation setting.

Low-Resource Neural Machine Translation Segmentation +2

Paper
Add Code

The Challenges of Optimizing Machine Translation for Low Resource Cross-Language Information Retrieval

no code implementations • IJCNLP 2019 • Constantine Lignos, Daniel Cohen, Yen-Chieh Lien, Pratik Mehta, W. Bruce Croft, Scott Miller

When performing cross-language information retrieval (CLIR) for lower-resourced languages, a common approach is to retrieve over the output of machine translation (MT).

Information Retrieval Machine Translation +2

Paper
Add Code

SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage

no code implementations • ACL 2019 • Elizabeth Boschee, Joel Barry, Jayadev Billa, Marjorie Freedman, Thamme Gowda, Constantine Lignos, Chester Palen-Michel, Michael Pust, Banriskhem Kayang Khonglah, Srikanth Madikeri, Jonathan May, Scott Miller

In this paper we present an end-to-end cross-lingual information retrieval (CLIR) and summarization system for low-resource languages that 1) enables English speakers to search foreign language repositories of text and audio using English queries, 2) summarizes the retrieved documents in English with respect to a particular information need, and 3) provides complete transcriptions and translations as needed.

Cross-Lingual Information Retrieval Machine Translation +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.