no code implementations • EMNLP (insights) 2021 • Ling Liu, Mans Hulden
Backtranslation is a common technique for leveraging unlabeled data in low-resource scenarios in machine translation.
no code implementations • EMNLP 2020 • Sarah Moeller, Ling Liu, Changbing Yang, Katharina Kann, Mans Hulden
An intermediate step in the linguistic analysis of an under-documented language is to find and organize inflected forms that are attested in natural speech.
no code implementations • LREC 2022 • Daniel Chen, Mans Hulden
Adpositions and case markers contain a high degree of polysemy and participate in unique semantic role configurations.
no code implementations • ACL (SIGMORPHON) 2021 • Tiago Pimentel, Maria Ryskina, Sabrina J. Mielke, Shijie Wu, Eleanor Chodroff, Brian Leonard, Garrett Nicolai, Yustinus Ghanggo Ate, Salam Khalifa, Nizar Habash, Charbel El-Khaissi, Omer Goldman, Michael Gasser, William Lane, Matt Coler, Arturo Oncevay, Jaime Rafael Montoya Samame, Gema Celeste Silva Villegas, Adam Ek, Jean-Philippe Bernardy, Andrey Shcherbakov, Aziyana Bayyr-ool, Karina Sheifer, Sofya Ganieva, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Andrew Krizhanovsky, Natalia Krizhanovsky, Clara Vania, Sardana Ivanova, Aelita Salchak, Christopher Straughn, Zoey Liu, Jonathan North Washington, Duygu Ataman, Witold Kieraś, Marcin Woliński, Totok Suhardijanto, Niklas Stoehr, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Richard J. Hatcher, Emily Prud'hommeaux, Ritesh Kumar, Mans Hulden, Botond Barta, Dorina Lakatos, Gábor Szolnok, Judit Ács, Mohit Raj, David Yarowsky, Ryan Cotterell, Ben Ambridge, Ekaterina Vylomova
This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features.
no code implementations • ACL (SIGMORPHON) 2021 • Adam Wiemerslage, Arya D. McCarthy, Alexander Erdmann, Garrett Nicolai, Manex Agirrezabal, Miikka Silfverberg, Mans Hulden, Katharina Kann
We describe the second SIGMORPHON shared task on unsupervised morphology: the goal of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering is to cluster word types from a raw text corpus into paradigms.
no code implementations • ACL 2022 • Ling Liu, Mans Hulden
Annotation errors that stem from various sources are usually unavoidable when performing large-scale annotation of linguistic data.
no code implementations • 27 Jun 2024 • Michael Ginn, Mans Hulden, Alexis Palmer
We explore whether LLMs can be effective at the task of interlinear glossing with in-context learning, without any traditional training.
no code implementations • 27 Jun 2024 • Michael Ginn, Mans Hulden
Dynamic topic models have been proposed as a tool for historical analysis, but traditional approaches have had limited usefulness, being difficult to configure, interpret, and evaluate.
1 code implementation • Findings of the Association for Computational Linguistics 2023 • Saliha Muradoglu, Mans Hulden
Neural sequence-to-sequence models have been very successful at tasks in phonology and morphology that seemingly require a capacity for intricate linguistic generalisations.
1 code implementation • 26 Oct 2022 • Saliha Muradoglu, Mans Hulden
In this paper, we explore four sampling strategies for the task of morphological inflection using a Transformer model: a pair of oracle experiments where data is chosen based on whether the model already can or cannot inflect the test forms correctly, as well as strategies based on high/low model confidence, entropy, as well as random selection.
no code implementations • LREC 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • ACL 2021 • Sarah Moeller, Ling Liu, Mans Hulden
However, the importance and usefulness of POS tags needs to be examined as NLP expands to low-resource languages because linguists who provide many annotated resources do not place priority on early identification and tagging of POS.
no code implementations • NAACL 2021 • Miikka Silfverberg, Francis Tyers, Garrett Nicolai, Mans Hulden
Sequence-to-sequence models have delivered impressive results in word formation tasks such as morphological inflection, often learning to model subtle morphophonological details with limited training data.
no code implementations • ACL 2022 • Ling Liu, Mans Hulden
Deep learning sequence models have been successfully applied to the task of morphological inflection.
no code implementations • 1 Apr 2021 • Miikka Silfverberg, Francis Tyers, Garrett Nicolai, Mans Hulden
Sequence-to-sequence models have delivered impressive results in word formation tasks such as morphological inflection, often learning to model subtle morphophonological details with limited training data.
1 code implementation • COLING 2020 • Ling Liu, Mans Hulden
Analogy is assumed to be the cognitive mechanism speakers resort to in order to inflect an unknown form of a lexeme based on knowledge of other words in a language.
no code implementations • WS 2020 • Sarah Beemer, Zak Boston, April Bukoski, Daniel Chen, Princess Dickens, Andrew Gerlach, Torin Hopkins, an, Parth Jawale, Chris Koski, Akanksha Malhotra, Piyush Mishra, Saliha Muradoglu, Lan Sang, Tyler Short, Sagarika Shreevastava, Elizabeth Spaulding, Testumichi Umada, Beilei Xiang, Changbing Yang, Mans Hulden
Sequence-to-sequence models have proven to be highly successful in learning morphological inflection from examples as the series of SIGMORPHON/CoNLL shared tasks have shown.
no code implementations • WS 2020 • Ling Liu, Mans Hulden
This paper presents the submission by the CU Ling team from the University of Colorado to SIGMORPHON 2020 shared task 0 on morphological inflection.
no code implementations • WS 2020 • Zach Ryan, Mans Hulden
The Transformer model has been shown to outperform other neural seq2seq models in several character-level tasks.
1 code implementation • WS 2020 • Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden
Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages.
no code implementations • WS 2020 • Katharina Kann, Arya McCarthy, Garrett Nicolai, Mans Hulden
In this paper, we describe the findings of the SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel task in the field of inflectional morphology.
3 code implementations • EACL 2021 • Shijie Wu, Ryan Cotterell, Mans Hulden
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
no code implementations • LREC 2020 • Arya D. McCarthy, Christo Kirov, Matteo Grella, Amrit Nidhi, Patrick Xia, Kyle Gorman, Ekaterina Vylomova, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, Timofey Arkhangelskiy, Nataly Krizhanovsky, Andrew Krizhanovsky, Elena Klyachko, Alexey Sorokin, John Mansfield, Valts Ern{\v{s}}treits, Yuval Pinter, Cass Jacobs, ra L., Ryan Cotterell, Mans Hulden, David Yarowsky
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • WS 2019 • Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Christo Kirov, Miikka Silfverberg, Sabrina J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden
The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages.
3 code implementations • LREC 2018 • Christo Kirov, Ryan Cotterell, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sabrina J. Mielke, Arya D. McCarthy, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden
The Universal Morphology UniMorph project is a collaborative effort to improve how NLP handles complex morphology across the world's languages.
no code implementations • CONLL 2018 • Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden
Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a cloze task.
no code implementations • WS 2018 • Arya D. McCarthy, Miikka Silfverberg, Ryan Cotterell, Mans Hulden, David Yarowsky
The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects each present schemata for annotating the morphosyntactic details of language.
1 code implementation • EMNLP 2018 • Miikka Silfverberg, Mans Hulden
The Paradigm Cell Filling Problem in morphology asks to complete word inflection tables from partial ones.
no code implementations • WS 2018 • Adam Wiemerslage, Miikka Silfverberg, Mans Hulden
Modeling morphological inflection is an important task in Natural Language Processing.
no code implementations • COLING 2018 • Sarah Moeller, Ghazaleh Kazeminejad, Andrew Cowell, Mans Hulden
We experiment with training an encoder-decoder neural model for mimicking the behavior of an existing hand-written finite-state morphological grammar for Arapaho verbs, a polysynthetic language with a highly complex verbal inflection system.
no code implementations • COLING 2018 • Miikka Silfverberg, Ling Liu, Mans Hulden
In supervised learning of morphological patterns, the strategy of generalizing inflectional tables into more abstract paradigms through alignment of the longest common subsequence found in an inflection table has been proposed as an efficient method to deduce the inflectional behavior of unseen word forms.
no code implementations • COLING 2018 • Sarah Moeller, Mans Hulden
Morphological analysis of morphologically rich and low-resource languages is important to both descriptive linguistics and natural language processing.
no code implementations • TACL 2019 • Ryan Cotterell, Christo Kirov, Mans Hulden, Jason Eisner
We quantify the linguistic complexity of different languages' morphological systems.
no code implementations • NAACL 2018 • Hubie Chen, Mans Hulden
We find that the natural class decision problem is tractable (i. e. is in P), while the minimization problem is not; the decision version of the problem which determines whether a natural class can be defined with $k$ features or less is NP-complete.
no code implementations • 23 Apr 2018 • Ryan Cotterell, Christo Kirov, Mans Hulden, Jason Eisner
Many languages' inflectional morphological systems are replete with irregulars, i. e., words that do not seem to follow standard inflectional rules.
no code implementations • RANLP 2017 • Manex Agirrezabal, Iñaki Alegria, Mans Hulden
Automatic analysis of poetic rhythm is a challenging task that involves linguistics, literature, and computer science.
no code implementations • WS 2017 • Miikka Silfverberg, Mans Hulden
Most NLP resources that offer annotations at the word segment level provide morphological annotation that includes features indicating tense, aspect, modality, gender, case, and other inflectional information.
1 code implementation • CONLL 2017 • Mans Hulden
This paper explores a divisive hierarchical clustering algorithm based on the well-known Obligatory Contour Principle in phonology.
no code implementations • CONLL 2017 • Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden
In sub-task 2, systems were given a lemma and some of its specific inflected forms, and asked to complete the inflectional paradigm by predicting all of the remaining inflected forms.
no code implementations • COLING 2016 • Manex Agirrezabal, I{\~n}aki Alegria, Mans Hulden
In this work we tackle the challenge of identifying rhythmic patterns in poetry written in English.
no code implementations • COLING 2016 • Lingshuang Mao, Mans Hulden
The modifications that foreign loanwords undergo when adapted into Japanese have been the subject of much study in linguistics.
no code implementations • LREC 2016 • Markus Forsberg, Mans Hulden
This paper presents a semi-automatic method to derive morphological analyzers from a limited number of example inflections suitable for languages with alphabetic writing systems.
no code implementations • LREC 2016 • Daniel Smith, Mans Hulden
We report on the implementation of a morphological analyzer for the Sahidic dialect of Coptic, a now extinct Afro-Asiatic language.
no code implementations • LREC 2016 • Izaskun Etxeberria, I{\~n}aki Alegria, Larraitz Uria, Mans Hulden
This paper presents a method for the normalization of historical texts using a combination of weighted finite-state transducers and language models.
no code implementations • LREC 2014 • Jerid Francom, Mans Hulden, Adam Ussishkin
Corpus resources for Spanish have proved invaluable for a number of applications in a wide variety of fields.
no code implementations • LREC 2014 • Yvonne Adesam, Malin Ahlberg, Peter Andersson, Gerlof Bouma, Markus Forsberg, Mans Hulden
In this paper we describe and evaluate a tool for paradigm induction and lexicon extraction that has been applied to Old Swedish.
no code implementations • LREC 2012 • Mans Hulden, Jerid Francom
We report on several experiments on combining a rule-based tagger and a trigram tagger for Spanish.