no code implementations • ComputEL (ACL) 2022 • Farhan Samir, Miikka Silfverberg
Data augmentation strategies are increasingly important in NLP pipelines for low-resourced and endangered languages, and in neural morphological inflection, augmentation by so called data hallucination is a popular technique.
no code implementations • Findings (ACL) 2022 • Clarissa Forbes, Farhan Samir, Bruce Oliver, Changbing Yang, Edith Coates, Garrett Nicolai, Miikka Silfverberg
With this paper, we make the case that IGT data can be leveraged successfully provided that target language expertise is available.
no code implementations • COLING 2022 • Garrett Nicolai, Changbing Yang, Miikka Silfverberg
Experiments on very low-resourced Indigenous North American languages show that an initially deficient multilingual translator can improve by 4. 9 BLEU through mBART pre-training, and 5. 5 BLEU points with the strategic addition of monolingual data, and that a divergence penalty leads to further increases of 0. 4 BLEU.
no code implementations • Findings (ACL) 2022 • Adam Wiemerslage, Miikka Silfverberg, Changbing Yang, Arya McCarthy, Garrett Nicolai, Eliana Colunga, Katharina Kann
Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages.
no code implementations • WS (NoDaLiDa) 2019 • Ilmari Kylliäinen, Miikka Silfverberg
We investigate different ensemble learning techniques for neural morphological inflection using bidirectional LSTM encoder-decoder models with attention.
no code implementations • COLING 2022 • Ife Adebara, Muhammad Abdul-Mageed, Miikka Silfverberg
In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems (BiLSTM and Transformer) when translating bare nouns in Yorùbá into English.
no code implementations • NAACL (SIGMORPHON) 2022 • Changbing Yang, Ruixin (Ray) Yang, Garrett Nicolai, Miikka Silfverberg
This paper presents experiments on morphological inflection using data from the SIGMORPHON-UniMorph 2022 Shared Task 0: Generalization and Typologically Diverse Morphological Inflection.
no code implementations • ACL (SIGMORPHON) 2021 • Clarissa Forbes, Garrett Nicolai, Miikka Silfverberg
This paper presents a finite-state morphological analyzer for the Gitksan language.
no code implementations • ACL (SIGMORPHON) 2021 • Changbing Yang, Garrett Nicolai, Miikka Silfverberg
Secondly, we experiment with more general rules which can apply transformations inside the input strings in addition to prefix and suffix transformations.
no code implementations • ACL (SIGMORPHON) 2021 • Adam Wiemerslage, Arya D. McCarthy, Alexander Erdmann, Garrett Nicolai, Manex Agirrezabal, Miikka Silfverberg, Mans Hulden, Katharina Kann
We describe the second SIGMORPHON shared task on unsupervised morphology: the goal of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering is to cluster word types from a raw text corpus into paradigms.
1 code implementation • LREC 2022 • Bruce Oliver, Clarissa Forbes, Changbing Yang, Farhan Samir, Edith Coates, Garrett Nicolai, Miikka Silfverberg
We use Gitksan data in interlinear glossed format, stemming from language documentation efforts, to build a database of partial inflection tables.
1 code implementation • 16 Jun 2024 • Changbing Yang, Garrett Nicolai, Miikka Silfverberg
In this paper, we address the data scarcity problem in automatic data-driven glossing for low-resource languages by coordinating multiple sources of linguistic expertise.
no code implementations • 13 Mar 2024 • Changbing Yang, Garrett Nicolai, Miikka Silfverberg
Aided by these enhancements, our model demonstrates an average improvement of 3. 97\%-points over the previous state of the art on datasets from the SIGMORPHON 2023 Shared Task on Interlinear Glossing.
1 code implementation • 26 May 2023 • Adam Wiemerslage, Changbing Yang, Garrett Nicolai, Miikka Silfverberg, Katharina Kann
We aim at closing this gap by investigating the types of noise encountered within a pipeline for truly unsupervised morphological paradigm completion and its impact on morphological inflection systems: First, we propose an error taxonomy and annotation pipeline for inflection training data.
1 code implementation • 23 May 2023 • Farhan Samir, Miikka Silfverberg
In this study, we aim to shed light on the theoretical aspects of the prominent data augmentation strategy StemCorrupt (Silfverberg et al., 2017; Anastasopoulos and Neubig, 2019), a method that generates synthetic examples by randomly substituting stem characters in gold standard training examples.
1 code implementation • COLING 2022 • Yige Chen, Eunkyul Leah Jo, Yundong Yao, Kyungtae Lim, Miikka Silfverberg, Francis M. Tyers, Jungyeul Park
In this study, we propose a morpheme-based scheme for Korean dependency parsing and adopt the proposed scheme to Universal Dependencies.
no code implementations • LREC 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • 17 Mar 2022 • Clarissa Forbes, Farhan Samir, Bruce Harold Oliver, Changbing Yang, Edith Coates, Garrett Nicolai, Miikka Silfverberg
With this paper, we make the case that IGT data can be leveraged successfully provided that target language expertise is available.
no code implementations • 16 Mar 2022 • Adam Wiemerslage, Miikka Silfverberg, Changbing Yang, Arya D. McCarthy, Garrett Nicolai, Eliana Colunga, Katharina Kann
Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages.
no code implementations • NAACL 2021 • Miikka Silfverberg, Francis Tyers, Garrett Nicolai, Mans Hulden
Sequence-to-sequence models have delivered impressive results in word formation tasks such as morphological inflection, often learning to model subtle morphophonological details with limited training data.
no code implementations • 1 Apr 2021 • Miikka Silfverberg, Francis Tyers, Garrett Nicolai, Mans Hulden
Sequence-to-sequence models have delivered impressive results in word formation tasks such as morphological inflection, often learning to model subtle morphophonological details with limited training data.
2 code implementations • 7 Mar 2021 • Ife Adebara, Muhammad Abdul-Mageed, Miikka Silfverberg
In this work, we perform fine-grained analysis on how an SMT system compares with two NMT systems (BiLSTM and Transformer) when translating bare nouns in Yor\`ub\'a into English.
no code implementations • COLING 2020 • Garrett Nicolai, Miikka Silfverberg
Morphological inflection, like many sequence-to-sequence tasks, sees great performance from recurrent neural architectures when data is plentiful, but performance falls off sharply in lower-data settings.
no code implementations • WS 2020 • Kaili Vesik, Muhammad Abdul-Mageed, Miikka Silfverberg
The task of grapheme-to-phoneme (G2P) conversion is important for both speech recognition and synthesis.
1 code implementation • WS 2020 • Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden
Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages.
no code implementations • LREC 2020 • Arya D. McCarthy, Christo Kirov, Matteo Grella, Amrit Nidhi, Patrick Xia, Kyle Gorman, Ekaterina Vylomova, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, Timofey Arkhangelskiy, Nataly Krizhanovsky, Andrew Krizhanovsky, Elena Klyachko, Alexey Sorokin, John Mansfield, Valts Ern{\v{s}}treits, Yuval Pinter, Cass Jacobs, ra L., Ryan Cotterell, Mans Hulden, David Yarowsky
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • LREC 2020 • Aleksi Sahala, Miikka Silfverberg, Antti Arppe, Krister Lind{\'e}n
Several Akkadian text corpora contain only the transliterated text.
no code implementations • LREC 2020 • Aleksi Sahala, Miikka Silfverberg, Antti Arppe, Krister Lind{\'e}n
Akkadian is a fairly well resourced extinct language that does not yet have a comprehensive morphological analyzer available.
no code implementations • CONLL 2019 • Kyle Gorman, Arya D. McCarthy, Ryan Cotterell, Ekaterina Vylomova, Miikka Silfverberg, Magdalena Markowska
We conduct a manual error analysis of the CoNLL-SIGMORPHON Shared Task on Morphological Reinflection.
no code implementations • WS 2019 • Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Christo Kirov, Miikka Silfverberg, Sabrina J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden
The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages.
2 code implementations • 12 Aug 2019 • Teemu Ruokolainen, Pekka Kauppinen, Miikka Silfverberg, Krister Lindén
We present a corpus of Finnish news articles with a manually prepared named entity annotation.
no code implementations • WS 2019 • Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen
In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019.
no code implementations • CONLL 2018 • Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden
Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a cloze task.
no code implementations • WS 2018 • Arya D. McCarthy, Miikka Silfverberg, Ryan Cotterell, Mans Hulden, David Yarowsky
The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects each present schemata for annotating the morphosyntactic details of language.
no code implementations • WS 2018 • Adam Wiemerslage, Miikka Silfverberg, Mans Hulden
Modeling morphological inflection is an important task in Natural Language Processing.
1 code implementation • EMNLP 2018 • Miikka Silfverberg, Mans Hulden
The Paradigm Cell Filling Problem in morphology asks to complete word inflection tables from partial ones.
no code implementations • COLING 2018 • Miikka Silfverberg, Senka Drobac
This paper presents the submission of the UH{\&}CU team (Joint University of Colorado and University of Helsinki team) for the VarDial 2018 shared task on morphosyntactic tagging of Croatian, Slovenian and Serbian tweets.
no code implementations • COLING 2018 • Miikka Silfverberg, Ling Liu, Mans Hulden
In supervised learning of morphological patterns, the strategy of generalizing inflectional tables into more abstract paradigms through alignment of the longest common subsequence found in an inflection table has been proposed as an efficient method to deduce the inflectional behavior of unseen word forms.
no code implementations • WS 2017 • Miikka Silfverberg, Mans Hulden
Most NLP resources that offer annotations at the word segment level provide morphological annotation that includes features indicating tense, aspect, modality, gender, case, and other inflectional information.
no code implementations • LREC 2014 • Senka Drobac, Krister Lind{\'e}n, Tommi Pirinen, Miikka Silfverberg
The most noticeable reduction in size we got with a morphological transducer for Greenlandic, whose original size is on average about 15 times larger than other morphologies.