no code implementations • EMNLP 2020 • Sarah Moeller, Ling Liu, Changbing Yang, Katharina Kann, Mans Hulden
An intermediate step in the linguistic analysis of an under-documented language is to find and organize inflected forms that are attested in natural speech.
no code implementations • ACL (SIGMORPHON) 2021 • Andrew Gerlach, Adam Wiemerslage, Katharina Kann
This paper describes our system for the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering, which asks participants to group inflected forms together according their underlying lemma without the aid of annotated training data.
no code implementations • ACL (SIGMORPHON) 2021 • Adam Wiemerslage, Arya D. McCarthy, Alexander Erdmann, Garrett Nicolai, Manex Agirrezabal, Miikka Silfverberg, Mans Hulden, Katharina Kann
We describe the second SIGMORPHON shared task on unsupervised morphology: the goal of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering is to cluster word types from a raw text corpus into paradigms.
1 code implementation • EMNLP 2021 • Cory Paik, Stéphane Aroca-Ouellette, Alessandro Roncone, Katharina Kann
Recent work has raised concerns about the inherent limitations of text-only pretraining.
no code implementations • FieldMatters (COLING) 2022 • Katharina Kann, Abteen Ebrahimi, Kristine Stenzel, Alexis Palmer
This translation task is challenging for multiple reasons: (1) the data is out-of-domain with respect to the MT system’s training data, (2) much of the data is conversational, (3) existing translations include non-standard and uncommon expressions, often reflecting properties of the documented language, and (4) the data includes borrowings from other regional languages.
no code implementations • NAACL (AmericasNLP) 2021 • Manuel Mager, Arturo Oncevay, Abteen Ebrahimi, John Ortega, Annette Rios, Angela Fan, Ximena Gutierrez-Vasques, Luis Chiruzzo, Gustavo Giménez-Lugo, Ricardo Ramos, Ivan Vladimir Meza Ruiz, Rolando Coto-Solano, Alexis Palmer, Elisabeth Mager-Hois, Vishrav Chaudhary, Graham Neubig, Ngoc Thang Vu, Katharina Kann
This paper presents the results of the 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas.
no code implementations • NLP4ConvAI (ACL) 2022 • Katharina Kann, Abteen Ebrahimi, Joewie Koh, Shiran Dudy, Alessandro Roncone
Human–computer conversation has long been an interest of artificial intelligence and natural language processing research.
no code implementations • Findings (ACL) 2022 • Adam Wiemerslage, Miikka Silfverberg, Changbing Yang, Arya McCarthy, Garrett Nicolai, Eliana Colunga, Katharina Kann
Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages.
no code implementations • NAACL (BEA) 2022 • Ananya Ganesh, Hugh Scribner, Jasdeep Singh, Katherine Goodman, Jean Hertzberg, Katharina Kann
We further investigate multi-task training on the related task of sentiment classification, which improves our model’s performance to 55 F1.
no code implementations • 27 Oct 2023 • Maria Valentini, Jennifer Weber, Jesus Salcido, Téa Wright, Eliana Colunga, Katharina Kann
With recent advances in large language models (LLMs), the concept of automatically generating children's educational materials has become increasingly realistic.
no code implementations • 11 Jun 2023 • Manuel Mager, Rajat Bhatnagar, Graham Neubig, Ngoc Thang Vu, Katharina Kann
Neural models have drastically advanced state of the art for machine translation (MT) between high-resource languages.
no code implementations • 31 May 2023 • Manuel Mager, Elisabeth Mager, Katharina Kann, Ngoc Thang Vu
In recent years machine translation has become very successful for high-resource language pairs.
1 code implementation • 26 May 2023 • Adam Wiemerslage, Changbing Yang, Garrett Nicolai, Miikka Silfverberg, Katharina Kann
We aim at closing this gap by investigating the types of noise encountered within a pipeline for truly unsupervised morphological paradigm completion and its impact on morphological inflection systems: First, we propose an error taxonomy and annotation pipeline for inflection training data.
1 code implementation • 15 Feb 2023 • Abteen Ebrahimi, Arya D. McCarthy, Arturo Oncevay, Luis Chiruzzo, John E. Ortega, Gustavo A. Giménez-Lugo, Rolando Coto-Solano, Katharina Kann
However, the languages most in need of automatic alignment are low-resource and, thus, not typically included in the pretraining data.
no code implementations • 19 Dec 2022 • Sagi Shaier, Lawrence Hunter, Katharina Kann
Many dialogue systems (DSs) lack characteristics humans have, such as emotion perception, factuality, and informativeness.
no code implementations • 30 Nov 2022 • Katharina Kann, Shiran Dudy, Arya D. McCarthy
The field of natural language processing (NLP) has grown over the last few years: conferences have become larger, we have published an incredible amount of papers, and state-of-the-art research has been implemented in a large variety of customer-facing products.
no code implementations • 22 Oct 2022 • Adam Wiemerslage, Shiran Dudy, Katharina Kann
Neural networks have long been at the center of a debate around the cognitive mechanism by which humans process inflectional morphology.
no code implementations • ACL 2022 • Yoshinari Fujinuma, Jordan Boyd-Graber, Katharina Kann
(2) Does the answer to that question change with model adaptation?
no code implementations • 16 Mar 2022 • Adam Wiemerslage, Miikka Silfverberg, Changbing Yang, Arya D. McCarthy, Garrett Nicolai, Eliana Colunga, Katharina Kann
Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages.
no code implementations • Findings (ACL) 2022 • Manuel Mager, Arturo Oncevay, Elisabeth Mager, Katharina Kann, Ngoc Thang Vu
Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation.
1 code implementation • 15 Oct 2021 • Cory Paik, Stéphane Aroca-Ouellette, Alessandro Roncone, Katharina Kann
Recent work has raised concerns about the inherent limitations of text-only pretraining.
no code implementations • MTSummit 2021 • Atul Kr. Ojha, Chao-Hong Liu, Katharina Kann, John Ortega, Sheetal Shatam, Theodorus Fransen
Maximum system performance was computed using BLEU and follow as 36. 0 for English--Irish, 34. 6 for Irish--English, 24. 2 for English--Marathi, and 31. 3 for Marathi--English.
no code implementations • 8 Jul 2021 • Michael A. Hedderich, Benjamin Roth, Katharina Kann, Barbara Plank, Alex Ratner, Dietrich Klakow
Welcome to WeaSuL 2021, the First Workshop on Weakly Supervised Learning, co-located with ICLR 2021.
no code implementations • ACL 2021 • Rajat Bhatnagar, Ananya Ganesh, Katharina Kann
Based on the insight that humans pay specific attention to movements, we use graphics interchange formats (GIFs) as a pivot to collect parallel sentences from monolingual annotators.
no code implementations • Findings (ACL) 2021 • Ananya Ganesh, Martha Palmer, Katharina Kann
Recent advances in natural language processing (NLP) have the ability to transform how classroom learning takes place.
1 code implementation • 7 Jun 2021 • Stéphane Aroca-Ouellette, Cory Paik, Alessandro Roncone, Katharina Kann
We present a new probing dataset named PROST: Physical Reasoning about Objects Through Space and Time.
no code implementations • ACL 2021 • Abteen Ebrahimi, Katharina Kann
Pretrained multilingual models (PMMs) enable zero-shot learning via cross-lingual transfer, performing best for languages seen during pretraining.
1 code implementation • ACL 2022 • Abteen Ebrahimi, Manuel Mager, Arturo Oncevay, Vishrav Chaudhary, Luis Chiruzzo, Angela Fan, John Ortega, Ricardo Ramos, Annette Rios, Ivan Meza-Ruiz, Gustavo A. Giménez-Lugo, Elisabeth Mager, Graham Neubig, Alexis Palmer, Rolando Coto-Solano, Ngoc Thang Vu, Katharina Kann
Continued pretraining offers improvements, with an average accuracy of 44. 05%.
no code implementations • EACL 2021 • Katharina Kann, Mauro M. Monsalve-Mercado
And how similar are character embeddings extracted from different models?
no code implementations • EACL 2021 • Beilei Xiang, Changbing Yang, Yu Li, Alex Warstadt, Katharina Kann
CLiMP consists of sets of 1, 000 minimal pairs (MPs) for 16 syntactic contrasts in Mandarin, covering 9 major Mandarin linguistic phenomena.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Nikhil Prabhu, Katharina Kann
Here, we propose a model that does not: a pointer-generator transformer for disjoint vocabularies.
no code implementations • EMNLP 2020 • Manuel Mager, Özlem Çetinoğlu, Katharina Kann
Canonical morphological segmentation consists of dividing words into their standardized morphemes.
no code implementations • EMNLP 2020 • Rajat Agarwal, Katharina Kann
We propose a new task in the area of computational creativity: acrostic poem generation in English.
no code implementations • WS 2020 • Diksha Meghwal, Katharina Kann, Iacer Calixto, Stanislaw Jastrzebski
Pretrained language models have obtained impressive results for a large set of natural language understanding tasks.
no code implementations • WS 2020 • Nikhil Prabhu, Katharina Kann
In this paper, we describe two CU-Boulder submissions to the SIGMORPHON 2020 Task 1 on multilingual grapheme-to-phoneme conversion (G2P).
no code implementations • WS 2020 • Manuel Mager, Katharina Kann
In this paper, we present the systems of the University of Stuttgart IMS and the University of Colorado Boulder (IMS--CUBoulder) for SIGMORPHON 2020 Task 2 on unsupervised morphological paradigm completion (Kann et al., 2020).
no code implementations • ACL 2020 • Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman
However, we fail to observe more granular correlations between probing and target task performance, highlighting the need for further work on broad-coverage probing benchmarks.
no code implementations • WS 2020 • Assaf Singer, Katharina Kann
Second, as inflected forms share most characters with the lemma, we further propose a pointer-generator transformer model to allow easy copying of input characters.
no code implementations • WS 2020 • Katharina Kann, Arya McCarthy, Garrett Nicolai, Mans Hulden
In this paper, we describe the findings of the SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel task in the field of inflectional morphology.
no code implementations • WS 2020 • Anhad Mohananey, Katharina Kann, Samuel R. Bowman
To be able to use our model's predictions during training, we extend a recent neural UP architecture, the PRPN (Shen et al., 2018a) such that it can be trained in a semi-supervised fashion.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Jason Phang, Iacer Calixto, Phu Mon Htut, Yada Pruksachatkun, Haokun Liu, Clara Vania, Katharina Kann, Samuel R. Bowman
Intermediate-task training---fine-tuning a pretrained model on an intermediate task before fine-tuning again on the target task---often improves model performance substantially on language understanding tasks in monolingual English settings.
Ranked #20 on
Zero-Shot Cross-Lingual Transfer
on XTREME
no code implementations • 25 May 2020 • Manuel Mager, Katharina Kann
In this paper, we present the systems of the University of Stuttgart IMS and the University of Colorado Boulder (IMS-CUBoulder) for SIGMORPHON 2020 Task 2 on unsupervised morphological paradigm completion (Kann et al., 2020).
1 code implementation • ACL 2020 • Huiming Jin, Liwei Cai, Yihui Peng, Chen Xia, Arya D. McCarthy, Katharina Kann
We propose the task of unsupervised morphological paradigm completion.
no code implementations • 1 May 2020 • Yada Pruksachatkun, Jason Phang, Haokun Liu, Phu Mon Htut, Xiaoyi Zhang, Richard Yuanzhe Pang, Clara Vania, Katharina Kann, Samuel R. Bowman
However, we fail to observe more granular correlations between probing and target task performance, highlighting the need for further work on broad-coverage probing benchmarks.
no code implementations • 28 Apr 2020 • Katharina Kann, Samuel R. Bowman, Kyunghyun Cho
We propose to cast the task of morphological inflection - mapping a lemma to an indicated inflected form - for resource-poor languages as a meta-learning problem.
no code implementations • 28 Apr 2020 • Katharina Kann, Ophélie Lacroix, Anders Søgaard
Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision - e. g., cross-lingual transfer, type-level supervision, or a combination thereof - have been reported to perform almost as well as supervised ones.
no code implementations • WS 2019 • Katharina Kann, Anhad Mohananey, Samuel R. Bowman, Kyunghyun Cho
Recently, neural network models which automatically infer syntactic structure from raw text have started to achieve promising results.
no code implementations • 22 Oct 2019 • Katharina Kann
The relation between language and thought has occupied linguists for at least a century.
no code implementations • SCiL 2020 • Katharina Kann
How does knowledge of one language's morphology influence learning of inflection rules in a second one?
no code implementations • IJCNLP 2019 • Katharina Kann, Kyunghyun Cho, Samuel R. Bowman
Here, we aim to answer the following questions: Does using a development set for early stopping in the low-resource setting influence results as compared to a more realistic alternative, where the number of training epochs is tuned on development languages?
no code implementations • WS 2019 • Johannes Bjerva, Katharina Kann, Isabelle Augenstein
Multi-task learning and self-training are two common ways to improve a machine learning model's performance in settings with limited training data.
1 code implementation • ACL 2019 • Yadollah Yaghoobzadeh, Katharina Kann, Timothy J. Hazen, Eneko Agirre, Hinrich Schütze
Word embeddings typically represent different meanings of a word in a single conflated vector.
no code implementations • NAACL 2019 • Manuel Mager, Özlem Çetinoğlu, Katharina Kann
Language identification for code-switching (CS), the phenomenon of alternating between two or more languages in conversations, has traditionally been approached under the assumption of a single language per token.
no code implementations • WS 2019 • Katharina Kann, Alex Warstadt, Adina Williams, Samuel R. Bowman
For converging evidence, we further construct LaVA, a corresponding word-level dataset, and investigate whether the same syntactic features can be extracted from word embeddings.
no code implementations • CONLL 2018 • Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden
Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a cloze task.
no code implementations • CONLL 2018 • Katharina Kann, Sascha Rothe, Katja Filippova
Motivated by recent findings on the probabilistic modeling of acceptability judgments, we propose syntactic log-odds ratio (SLOR), a normalized language model score, as a metric for referenceless fluency evaluation of natural language generation output at the sentence level.
no code implementations • EMNLP 2018 • Katharina Kann, Hinrich Schütze
Neural state-of-the-art sequence-to-sequence (seq2seq) models often do not perform well for small training sets.
1 code implementation • WS 2018 • Yadollah Yaghoobzadeh, Katharina Kann, Hinrich Schütze
We propose a new evaluation method for word embeddings based on multi-label classification given a word embedding.
no code implementations • WS 2018 • Katharina Kann, Johannes Bjerva, Isabelle Augenstein, Barbara Plank, Anders S{\o}gaard
Neural part-of-speech (POS) taggers are known to not perform well with little training data.
no code implementations • COLING 2018 • Manuel Mager, Elisabeth Mager, Alfonso Medina-Urrea, Ivan Meza, Katharina Kann
Machine translation from polysynthetic to fusional languages is a challenging task, which gets further complicated by the limited amount of parallel text available.
no code implementations • NAACL 2018 • Katharina Kann, Manuel Mager, Ivan Meza-Ruiz, Hinrich Schütze
Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce.
no code implementations • WS 2017 • Huiming Jin, Katharina Kann
Multi-task training is an effective method to mitigate the data sparsity problem.
no code implementations • WS 2017 • Katharina Kann, Hinrich Schütze
We present a semi-supervised way of training a character-based encoder-decoder recurrent neural network for morphological reinflection, the task of generating one inflected word form from another.
no code implementations • ACL 2017 • Katharina Kann, Ryan Cotterell, Hinrich Schütze
We present a novel cross-lingual transfer method for paradigm completion, the task of mapping a lemma to its inflected forms, using a neural encoder-decoder model, the state of the art for the monolingual task.
4 code implementations • 7 Feb 2017 • Wenpeng Yin, Katharina Kann, Mo Yu, Hinrich Schütze
Deep neural networks (DNN) have revolutionized the field of natural language processing (NLP).
no code implementations • EACL 2017 • Katharina Kann, Ryan Cotterell, Hinrich Schütze
We explore the task of multi-source morphological reinflection, which generalizes the standard, single-source version.
1 code implementation • ACL 2016 • Katharina Kann, Hinrich Schütze
Morphological reinflection is the task of generating a target form given a source form, a source tag and a target tag.