Search Results for author: Garrett Nicolai

Found 46 papers, 5 papers with code

Morphological Processing of Low-Resource Languages: Where We Are and What’s Next

no code implementations Findings (ACL) 2022 Adam Wiemerslage, Miikka Silfverberg, Changbing Yang, Arya McCarthy, Garrett Nicolai, Eliana Colunga, Katharina Kann

Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages.

An Inflectional Database for Gitksan

1 code implementation LREC 2022 Bruce Oliver, Clarissa Forbes, Changbing Yang, Farhan Samir, Edith Coates, Garrett Nicolai, Miikka Silfverberg

We use Gitksan data in interlinear glossed format, stemming from language documentation efforts, to build a database of partial inflection tables.

Data Augmentation Hallucination +1

Penalizing Divergence: Multi-Parallel Translation for Low-Resource Languages of North America

no code implementations COLING 2022 Garrett Nicolai, Changbing Yang, Miikka Silfverberg

Experiments on very low-resourced Indigenous North American languages show that an initially deficient multilingual translator can improve by 4. 9 BLEU through mBART pre-training, and 5. 5 BLEU points with the strategic addition of monolingual data, and that a divergence penalty leads to further increases of 0. 4 BLEU.

Machine Translation Translation

Generalizing Morphological Inflection Systems to Unseen Lemmas

no code implementations NAACL (SIGMORPHON) 2022 Changbing Yang, Ruixin (Ray) Yang, Garrett Nicolai, Miikka Silfverberg

This paper presents experiments on morphological inflection using data from the SIGMORPHON-UniMorph 2022 Shared Task 0: Generalization and Typologically Diverse Morphological Inflection.

Hallucination LEMMA +1

Findings of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering

no code implementations ACL (SIGMORPHON) 2021 Adam Wiemerslage, Arya D. McCarthy, Alexander Erdmann, Garrett Nicolai, Manex Agirrezabal, Miikka Silfverberg, Mans Hulden, Katharina Kann

We describe the second SIGMORPHON shared task on unsupervised morphology: the goal of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering is to cluster word types from a raw text corpus into paradigms.

Clustering

Unsupervised Paradigm Clustering Using Transformation Rules

no code implementations ACL (SIGMORPHON) 2021 Changbing Yang, Garrett Nicolai, Miikka Silfverberg

Secondly, we experiment with more general rules which can apply transformations inside the input strings in addition to prefix and suffix transformations.

Clustering Task 2

Linguistic Knowledge in Multilingual Grapheme-to-Phoneme Conversion

no code implementations ACL (SIGMORPHON) 2021 Roger Yu-Hsiang Lo, Garrett Nicolai

This paper documents the UBC Linguistics team’s approach to the SIGMORPHON 2021 Grapheme-to-Phoneme Shared Task, concentrating on the low-resource setting.

Embedded Translations for Low-resource Automated Glossing

no code implementations13 Mar 2024 Changbing Yang, Garrett Nicolai, Miikka Silfverberg

Aided by these enhancements, our model demonstrates an average improvement of 3. 97\%-points over the previous state of the art on datasets from the SIGMORPHON 2023 Shared Task on Interlinear Glossing.

Translation

Neural Machine Translation Data Generation and Augmentation using ChatGPT

no code implementations11 Jul 2023 Wayne Yang, Garrett Nicolai

Neural models have revolutionized the field of machine translation, but creating parallel corpora is expensive and time-consuming.

Machine Translation Translation

An Investigation of Noise in Morphological Inflection

1 code implementation26 May 2023 Adam Wiemerslage, Changbing Yang, Garrett Nicolai, Miikka Silfverberg, Katharina Kann

We aim at closing this gap by investigating the types of noise encountered within a pipeline for truly unsupervised morphological paradigm completion and its impact on morphological inflection systems: First, we propose an error taxonomy and annotation pipeline for inflection training data.

Language Modelling Masked Language Modeling +1

UniMorph 4.0: Universal Morphology

no code implementations LREC 2022 Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

Dim Wihl Gat Tun: The Case for Linguistic Expertise in NLP for Underdocumented Languages

no code implementations17 Mar 2022 Clarissa Forbes, Farhan Samir, Bruce Harold Oliver, Changbing Yang, Edith Coates, Garrett Nicolai, Miikka Silfverberg

With this paper, we make the case that IGT data can be leveraged successfully provided that target language expertise is available.

Morphological Processing of Low-Resource Languages: Where We Are and What's Next

no code implementations16 Mar 2022 Adam Wiemerslage, Miikka Silfverberg, Changbing Yang, Arya D. McCarthy, Garrett Nicolai, Eliana Colunga, Katharina Kann

Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages.

Do RNN States Encode Abstract Phonological Alternations?

no code implementations NAACL 2021 Miikka Silfverberg, Francis Tyers, Garrett Nicolai, Mans Hulden

Sequence-to-sequence models have delivered impressive results in word formation tasks such as morphological inflection, often learning to model subtle morphophonological details with limited training data.

Memorization Morphological Inflection

Do RNN States Encode Abstract Phonological Processes?

no code implementations1 Apr 2021 Miikka Silfverberg, Francis Tyers, Garrett Nicolai, Mans Hulden

Sequence-to-sequence models have delivered impressive results in word formation tasks such as morphological inflection, often learning to model subtle morphophonological details with limited training data.

Memorization Morphological Inflection

Noise Isn't Always Negative: Countering Exposure Bias in Sequence-to-Sequence Inflection Models

no code implementations COLING 2020 Garrett Nicolai, Miikka Silfverberg

Morphological inflection, like many sequence-to-sequence tasks, sees great performance from recurrent neural architectures when data is plentiful, but performance falls off sharply in lower-data settings.

Morphological Inflection

The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion

no code implementations WS 2020 Katharina Kann, Arya McCarthy, Garrett Nicolai, Mans Hulden

In this paper, we describe the findings of the SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel task in the field of inflectional morphology.

LEMMA Task 2

JHUBC's Submission to LT4HALA EvaLatin 2020

no code implementations LREC 2020 Winston Wu, Garrett Nicolai

We describe the JHUBC submission to the EvaLatin Shared task on lemmatization and part-of-speech tagging for Latin.

Lemmatization Part-Of-Speech Tagging +1

Multilingual Dictionary Based Construction of Core Vocabulary

no code implementations LREC 2020 Winston Wu, Garrett Nicolai, David Yarowsky

We propose a new functional definition and construction method for core vocabulary sets for multiple applications based on the relative coverage of a target concept in thousands of bilingual dictionaries.

Cognate Prediction Machine Translation +1

Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages

no code implementations LREC 2020 Garrett Nicolai, Dylan Lewis, Arya D. McCarthy, Aaron Mueller, Winston Wu, David Yarowsky

Exploiting the broad translation of the Bible into the world{'}s languages, we train and distribute morphosyntactic tools for approximately one thousand languages, vastly outstripping previous distributions of tools devoted to the processing of inflectional morphology.

Translation

An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages

no code implementations LREC 2020 Aaron Mueller, Garrett Nicolai, Arya D. McCarthy, Dylan Lewis, Winston Wu, David Yarowsky

We find that best practices in this domain are highly language-specific: adding more languages to a training set is often better, but too many harms performance{---}the best number depends on the source language.

Low-Resource Neural Machine Translation Translation

Induced Inflection-Set Keyword Search in Speech

1 code implementation WS 2020 Oliver Adams, Matthew Wiesner, Jan Trmal, Garrett Nicolai, David Yarowsky

We investigate the problem of searching for a lexeme-set in speech by searching for its inflectional variants.

The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

no code implementations WS 2019 Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Christo Kirov, Miikka Silfverberg, Sabrina J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden

The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages.

Cross-Lingual Transfer Lemmatization +3

The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

no code implementations CONLL 2018 Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden

Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a cloze task.

LEMMA Task 2

String Transduction with Target Language Models and Insertion Handling

no code implementations WS 2018 Garrett Nicolai, Saeed Najafi, Grzegorz Kondrak

Many character-level tasks can be framed as sequence-to-sequence transduction, where the target is a word from a natural language.

Morphological Analysis without Expert Annotation

no code implementations EACL 2017 Garrett Nicolai, Grzegorz Kondrak

The task of morphological analysis is to produce a complete list of lemma+tag analyses for a given word-form.

LEMMA Morphological Analysis +2

Cannot find the paper you are looking for? You can Submit a new open access paper.