no code implementations • EMNLP (insights) 2020 • Constantine Lignos, Marjan Kamyab
We propose best practices to increase the replicability of NER evaluations by increasing transparency regarding the handling of improper label sequences.
no code implementations • loresmt (AACL) 2020 • Molly Moran, Constantine Lignos
In this paper, we evaluate LSTM, biLSTM, GRU, and Transformer architectures for the task of name transliteration in a many-to-one multilingual paradigm, transliterating from 590 languages to English.
no code implementations • ACL 2022 • Elena Álvarez-Mellado, Constantine Lignos
This work presents a new resource for borrowing identification and analyzes the performance and errors of several models on this task.
1 code implementation • 28 Feb 2022 • Jonne Sälevä, Constantine Lignos
This preprint describes work in progress on ParaNames, a multilingual parallel name resource consisting of names for approximately 14 million entities.
no code implementations • Findings (ACL) 2022 • Constantine Lignos, Nolan Holley, Chester Palen-Michel, Jonne Sälevä
We then discuss the importance of creating annotation for lower-resourced languages in a thoughtful and ethical way that includes the languages' speakers as part of the development process.
2 code implementations • 14 Jan 2022 • Chester Palen-Michel, June Kim, Constantine Lignos
We present a new multilingual corpus containing text in 44 languages, many of which have relatively few existing resources for natural language processing.
no code implementations • 29 Oct 2021 • Elena Álvarez Mellado, Luis Espinosa Anke, Julio Gonzalo Arroyo, Constantine Lignos, Jordi Porta Zamorano
This paper summarizes the main findings of the ADoBo 2021 shared task, proposed in the context of IberLef 2021.
1 code implementation • EMNLP (Eval4NLP) 2021 • Chester Palen-Michel, Nolan Holley, Constantine Lignos
To address a looming crisis of unreproducible evaluation for named entity recognition, we propose guidelines and introduce SeqScore, a software package to improve reproducibility.
1 code implementation • NAACL 2021 • Thamme Gowda, Weiqiu You, Constantine Lignos, Jonathan May
While traditional corpus-level evaluation metrics for machine translation (MT) correlate well with fluency, they struggle to reflect adequacy.
1 code implementation • 1 Apr 2021 • Jonne Sälevä, Constantine Lignos
This work supports further development of language technology for the languages of Africa by providing a Wikidata-derived resource of name lists corresponding to common entity types (person, location, and organization).
no code implementations • EACL 2021 • Jingxuan Tu, Constantine Lignos
We propose the Tough Mentions Recall (TMR) metrics to supplement traditional named entity recognition (NER) evaluation by examining recall on specific subsets of "tough" mentions: unseen mentions, those whose tokens or token/type combination were not observed in training, and type-confusable mentions, token sequences with multiple entity types in the test data.
1 code implementation • 22 Mar 2021 • David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi, Verrah Otiende, Iroro Orife, Davis David, Samba Ngom, Tosin Adewumi, Paul Rayson, Mofetoluwa Adeyemi, Gerald Muriuki, Emmanuel Anebi, Chiamaka Chukwuneke, Nkiruka Odu, Eric Peter Wairagala, Samuel Oyerinde, Clemencia Siro, Tobius Saul Bateesa, Temilola Oloyede, Yvonne Wambui, Victor Akinode, Deborah Nabagereka, Maurice Katusiime, Ayodele Awokoya, Mouhamadane MBOUP, Dibora Gebreyohannes, Henok Tilaye, Kelechi Nwaike, Degaga Wolde, Abdoulaye Faye, Blessing Sibanda, Orevaoghene Ahia, Bonaventure F. P. Dossou, Kelechi Ogueji, Thierno Ibrahima DIOP, Abdoulaye Diallo, Adewale Akinfaderin, Tendai Marengereke, Salomey Osei
We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders.
no code implementations • EACL 2021 • Jonne Sälevä, Constantine Lignos
This paper evaluates the performance of several modern subword segmentation methods in a low-resource neural machine translation setting.
no code implementations • IJCNLP 2019 • Constantine Lignos, Daniel Cohen, Yen-Chieh Lien, Pratik Mehta, W. Bruce Croft, Scott Miller
When performing cross-language information retrieval (CLIR) for lower-resourced languages, a common approach is to retrieve over the output of machine translation (MT).
no code implementations • ACL 2019 • Elizabeth Boschee, Joel Barry, Jayadev Billa, Marjorie Freedman, Thamme Gowda, Constantine Lignos, Chester Palen-Michel, Michael Pust, Banriskhem Kayang Khonglah, Srikanth Madikeri, Jonathan May, Scott Miller
In this paper we present an end-to-end cross-lingual information retrieval (CLIR) and summarization system for low-resource languages that 1) enables English speakers to search foreign language repositories of text and audio using English queries, 2) summarizes the retrieved documents in English with respect to a particular information need, and 3) provides complete transcriptions and translations as needed.