no code implementations • NAACL (DaSH) 2021 • Tatiana Tsygankova, Francesca Marini, Stephen Mayhew, Dan Roth
In low-resource natural language processing (NLP), the key problems are a lack of target language training data, and a lack of native speakers to create it.
Low Resource Named Entity Recognition
named-entity-recognition
+2
no code implementations • 5 Jun 2024 • Ali Malik, Stephen Mayhew, Chris Piech, Klinton Bicknell
We study the problem of controlling the difficulty level of text generated by Large Language Models (LLMs) for contexts where end-users are not fully proficient, such as language learners.
2 code implementations • arXiv 2023 • Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, LJ Miranda, Barbara Plank, Arij Riabi, Yuval Pinter
We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages.
Ranked #1 on
Named Entity Recognition (NER)
on UNER v1 (Danish)
2 code implementations • 22 Mar 2021 • David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi, Verrah Otiende, Iroro Orife, Davis David, Samba Ngom, Tosin Adewumi, Paul Rayson, Mofetoluwa Adeyemi, Gerald Muriuki, Emmanuel Anebi, Chiamaka Chukwuneke, Nkiruka Odu, Eric Peter Wairagala, Samuel Oyerinde, Clemencia Siro, Tobius Saul Bateesa, Temilola Oloyede, Yvonne Wambui, Victor Akinode, Deborah Nabagereka, Maurice Katusiime, Ayodele Awokoya, Mouhamadane MBOUP, Dibora Gebreyohannes, Henok Tilaye, Kelechi Nwaike, Degaga Wolde, Abdoulaye Faye, Blessing Sibanda, Orevaoghene Ahia, Bonaventure F. P. Dossou, Kelechi Ogueji, Thierno Ibrahima DIOP, Abdoulaye Diallo, Adewale Akinfaderin, Tendai Marengereke, Salomey Osei
We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders.
1 code implementation • WS 2020 • Stephen Mayhew, Klinton Bicknell, Chris Brust, Bill McDowell, Will Monroe, Burr Settles
We present the task of Simultaneous Translation and Paraphrasing for Language Education (STAPLE).
no code implementations • 17 Jun 2020 • Tatiana Tsygankova, Francesca Marini, Stephen Mayhew, Dan Roth
In low-resource natural language processing (NLP), the key problems are a lack of target language training data, and a lack of native speakers to create it.
Low Resource Named Entity Recognition
named-entity-recognition
+2
no code implementations • Findings of the Association for Computational Linguistics 2020 • Zihan Wang, Karthikeyan K, Stephen Mayhew, Dan Roth
Multilingual BERT (M-BERT) has been a huge success in both supervised and zero-shot cross-lingual transfer learning.
no code implementations • ICLR 2020 • Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth
Recent work has exhibited the surprising cross-lingual abilities of multilingual BERT (M-BERT) -- surprising since it is trained without any cross-lingual objective and with no aligned data.
no code implementations • 15 Dec 2019 • Stephen Mayhew, Nitish Gupta, Dan Roth
Although modern named entity recognition (NER) systems show impressive performance on standard datasets, they perform poorly when presented with noisy data.
Ranked #12 on
Named Entity Recognition (NER)
on WNUT 2017
no code implementations • CONLL 2019 • Stephen Mayhew, Snigdha Chaturvedi, Chen-Tse Tsai, Dan Roth
Supervised machine learning assumes the availability of fully-labeled data, but in many cases, such as low-resource languages, the only data available is partially annotated.
no code implementations • WS 2019 • Tatiana Tsygankova, Stephen Mayhew, Dan Roth
This paper describes the Cognitive Computation (CogComp) Group{'}s submissions to the multilingual named entity recognition shared task at the Balto-Slavic Natural Language Processing (BSNLP) Workshop.
Multilingual Named Entity Recognition
named-entity-recognition
+2
1 code implementation • WS 2019 • Robert Shaffer, Stephen Mayhew
This paper describes a dataset and baseline systems for linking paragraphs from court cases to clauses or amendments in the US Constitution.
no code implementations • IJCNLP 2019 • Stephen Mayhew, Tatiana Tsygankova, Dan Roth
While prior work and first impressions might suggest training a caseless model, or using a truecaser at test time, we show that the most effective strategy is a concatenation of cased and lowercased training data, producing a single model with high performance on both cased and uncased text.
no code implementations • EMNLP 2018 • Xiaodong Yu, Stephen Mayhew, Mark Sammons, Dan Roth
Character-level patterns have been widely used as features in English Named Entity Recognition (NER) systems.
Multilingual Named Entity Recognition
named-entity-recognition
+2
1 code implementation • ACL 2018 • Stephen Mayhew, Dan Roth
We present a new web-based interface, TALEN, designed for named entity annotation in low-resource settings where the annotators do not speak the language.
no code implementations • WS 2018 • Devanshu Jain, Maria Kustikova, Mayank Darbari, Rishabh Gupta, Stephen Mayhew
In this work, we address the problem of Named Entity Recognition (NER) in code-switched tweets as a part of the Workshop on Computational Approaches to Linguistic Code-switching (CALCS) at ACL{'}18.
1 code implementation • LREC 2018 • Daniel Khashabi, Mark Sammons, Ben Zhou, Tom Redman, Christos Christodoulopoulos, Vivek Srikumar, Nicholas Rizzolo, Lev Ratinov, Guanheng Luo, Quang Do, Chen-Tse Tsai, Subhro Roy, Stephen Mayhew, Zhili Feng, John Wieting, Xiaodong Yu, Yangqiu Song, Shashank Gupta, Shyam Upadhyay, Naveen Arivazhagan, Qiang Ning, Shaoshi Ling, Dan Roth
no code implementations • EMNLP 2017 • Stephen Mayhew, Chen-Tse Tsai, Dan Roth
Recent work in NLP has attempted to deal with low-resource languages but still assumed a resource level that is not present for most languages, e. g., the availability of Wikipedia in the target language.
no code implementations • 13 Nov 2016 • Yangqiu Song, Stephen Mayhew, Dan Roth
We use a word-level dictionary to convert documents in a SWL to a large-Wikipedia language (LWLs), and then perform CLDDC based on the LWL's Wikipedia.
no code implementations • 14 Sep 2016 • Stephen Mayhew, Christos Christodoulopoulos, Dan Roth
We introduce a method for transliteration generation that can produce transliterations in every language.
no code implementations • LREC 2014 • Hao Wu, Zhiye Fei, Aaron Dai, Mark Sammons, Dan Roth, Stephen Mayhew
Natural Language Processing (NLP) continues to grow in popularity in a range of research and commercial applications.