1 code implementation • NAACL (SIGMORPHON) 2022 • Jordan Kodner, Salam Khalifa, Khuyagbaatar Batsuren, Hossep Dolatian, Ryan Cotterell, Faruk Akkus, Antonios Anastasopoulos, Taras Andrushko, Aryaman Arora, Nona Atanalov, Gábor Bella, Elena Budianskaya, Yustinus Ghanggo Ate, Omer Goldman, David Guriel, Simon Guriel, Silvia Guriel-Agiashvili, Witold Kieraś, Andrew Krizhanovsky, Natalia Krizhanovsky, Igor Marchenko, Magdalena Markowska, Polina Mashkovtseva, Maria Nepomniashchaya, Daria Rodionova, Karina Scheifer, Alexandra Sorova, Anastasia Yemelina, Jeremiah Young, Ekaterina Vylomova
The 2022 SIGMORPHON–UniMorph shared task on large scale morphological inflection generation included a wide range of typologically diverse languages: 33 languages from 11 top-level language families: Arabic (Modern Standard), Assamese, Braj, Chukchi, Eastern Armenian, Evenki, Georgian, Gothic, Gujarati, Hebrew, Hungarian, Itelmen, Karelian, Kazakh, Ket, Khalkha Mongolian, Kholosi, Korean, Lamahalot, Low German, Ludic, Magahi, Middle Low German, Old English, Old High German, Old Norse, Polish, Pomak, Slovak, Turkish, Upper Sorbian, Veps, and Xibe.
no code implementations • NAACL (SIGMORPHON) 2022 • Andreas Sherbakov, Ekaterina Vylomova
The paper describes the Flexica team’s submission to the SIGMORPHON 2022 Shared Task 1 Part 1: Typologically Diverse Morphological Inflection.
no code implementations • NAACL (SIGTYP) 2021 • Andreas Scherbakov, Liam Whittle, Ritesh Kumar, Siddharth Singh, Matthew Coleman, Ekaterina Vylomova
The paper presents Anlirika’s submission to SIGTYP 2021 Shared Task on Robust Spoken Language Identification.
no code implementations • ACL (SIGMORPHON) 2021 • Tiago Pimentel, Maria Ryskina, Sabrina J. Mielke, Shijie Wu, Eleanor Chodroff, Brian Leonard, Garrett Nicolai, Yustinus Ghanggo Ate, Salam Khalifa, Nizar Habash, Charbel El-Khaissi, Omer Goldman, Michael Gasser, William Lane, Matt Coler, Arturo Oncevay, Jaime Rafael Montoya Samame, Gema Celeste Silva Villegas, Adam Ek, Jean-Philippe Bernardy, Andrey Shcherbakov, Aziyana Bayyr-ool, Karina Sheifer, Sofya Ganieva, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Andrew Krizhanovsky, Natalia Krizhanovsky, Clara Vania, Sardana Ivanova, Aelita Salchak, Christopher Straughn, Zoey Liu, Jonathan North Washington, Duygu Ataman, Witold Kieraś, Marcin Woliński, Totok Suhardijanto, Niklas Stoehr, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Richard J. Hatcher, Emily Prud'hommeaux, Ritesh Kumar, Mans Hulden, Botond Barta, Dorina Lakatos, Gábor Szolnok, Judit Ács, Mohit Raj, David Yarowsky, Ryan Cotterell, Ben Ambridge, Ekaterina Vylomova
This year's iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features.
1 code implementation • NAACL (SIGTYP) 2022 • Johann-Mattis List, Ekaterina Vylomova, Robert Forkel, Nathan Hill, Ryan Cotterell
This study describes the structure and the results of the SIGTYP 2022 shared task on the prediction of cognate reflexes from multilingual wordlists.
no code implementations • ALTA 2020 • Andrei Shcherbakov, Saliha Muradoglu, Ekaterina Vylomova
The paper investigates repetitive loops, a common problem in contemporary text generation (such as machine translation, language modelling, morphological inflection) systems.
no code implementations • 19 Nov 2024 • Raphael Merx, Hanna Suominen, Adérito José Guterres Correia, Trevor Cohn, Ekaterina Vylomova
The impact of machine translation (MT) on low-resource languages remains poorly understood.
1 code implementation • 4 Oct 2024 • Raphael Merx, Ekaterina Vylomova, Kemal Kurniawan
Additionally, we explore the use of pre-trained language models for automated rating of examples, finding that sentence perplexity serves as a good proxy for typicality and intelligibility in higher-resourced languages.
1 code implementation • 22 Sep 2024 • Aso Mahmudi, Borja Herce, Demian Inostroza Amestica, Andreas Scherbakov, Eduard Hovy, Ekaterina Vylomova
Linguistic fieldwork is an important component in language documentation and preservation.
1 code implementation • 10 Jun 2024 • Naomi Baes, Nick Haslam, Ekaterina Vylomova
Historical linguists have identified multiple forms of lexical semantic change.
1 code implementation • 20 Apr 2024 • Khuyagbaatar Batsuren, Ekaterina Vylomova, Verna Dankers, Tsetsuukhei Delgerbaatar, Omri Uzan, Yuval Pinter, Gábor Bella
Our empirical findings show that the accuracy of UniMorph Labeller is 98%, and that, in all language models studied (including ALBERT, BERT, RoBERTa, and DeBERTa), alien tokenization leads to poorer generalizations compared to morphological tokenization for semantic compositionality of word meanings.
1 code implementation • 7 Apr 2024 • Raphaël Merx, Aso Mahmudi, Katrina Langford, Leo Alberto de Araujo, Ekaterina Vylomova
Leveraging a novel corpus derived from a Mambai language manual and additional sentences translated by a native speaker, we examine the efficacy of few-shot LLM prompting for machine translation (MT) in this low-resource context.
no code implementations • 20 Feb 2024 • Zheng Wei Lim, Ekaterina Vylomova, Trevor Cohn, Charles Kemp
On one hand, intuition and some prior work suggest that accuracy and fluency should trade off against each other, and that capturing every detail of the source can only be achieved at the cost of fluency.
no code implementations • 19 Dec 2023 • Zheng Wei Lim, Ekaterina Vylomova, Charles Kemp, Trevor Cohn
Human translators linger on some words and phrases more than others, and predicting this variation is a step towards explaining the underlying cognitive processes.
1 code implementation • NAACL (SIGMORPHON) 2022 • Khuyagbaatar Batsuren, Gábor Bella, Aryaman Arora, Viktor Martinović, Kyle Gorman, Zdeněk Žabokrtský, Amarsanaa Ganbold, Šárka Dohnalová, Magda Ševčíková, Kateřina Pelegrinová, Fausto Giunchiglia, Ryan Cotterell, Ekaterina Vylomova
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections.
Ranked #8 on Morpheme Segmentaiton on UniMorph 4.0
no code implementations • LREC 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • NAACL (SIGTYP) 2021 • Elizabeth Salesky, Badr M. Abdullah, Sabrina J. Mielke, Elena Klyachko, Oleg Serikov, Edoardo Ponti, Ritesh Kumar, Ryan Cotterell, Ekaterina Vylomova
While language identification is a fundamental speech and language processing task, for many languages and language families it remains a challenging task.
no code implementations • ALTA 2020 • Saliha Muradoğlu, Nicholas Evans, Ekaterina Vylomova
Nen verbal morphology is remarkably complex; a transitive verb can take up to 1, 740 unique forms.
no code implementations • EMNLP (SIGTYP) 2020 • Johannes Bjerva, Elizabeth Salesky, Sabrina J. Mielke, Aditi Chaudhary, Giuseppe G. A. Celano, Edoardo M. Ponti, Ekaterina Vylomova, Ryan Cotterell, Isabelle Augenstein
Typological knowledge bases (KBs) such as WALS (Dryer and Haspelmath, 2013) contain information about linguistic properties of the world's languages.
1 code implementation • WS 2020 • Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden
Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages.
no code implementations • LREC 2020 • Arya D. McCarthy, Christo Kirov, Matteo Grella, Amrit Nidhi, Patrick Xia, Kyle Gorman, Ekaterina Vylomova, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, Timofey Arkhangelskiy, Nataly Krizhanovsky, Andrew Krizhanovsky, Elena Klyachko, Alexey Sorokin, John Mansfield, Valts Ern{\v{s}}treits, Yuval Pinter, Cass Jacobs, ra L., Ryan Cotterell, Mans Hulden, David Yarowsky
The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.
no code implementations • CONLL 2019 • Kyle Gorman, Arya D. McCarthy, Ryan Cotterell, Ekaterina Vylomova, Miikka Silfverberg, Magdalena Markowska
We conduct a manual error analysis of the CoNLL-SIGMORPHON Shared Task on Morphological Reinflection.
no code implementations • WS 2019 • Arya D. McCarthy, Ekaterina Vylomova, Shijie Wu, Chaitanya Malaviya, Lawrence Wolf-Sonkin, Garrett Nicolai, Christo Kirov, Miikka Silfverberg, Sabrina J. Mielke, Jeffrey Heinz, Ryan Cotterell, Mans Hulden
The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of inflection between 100 language pairs, as well as contextual lemmatization and morphosyntactic description in 66 languages.
no code implementations • WS 2019 • Ekaterina Vylomova, Sean Murphy, Nicholas Haslam
The paper focuses on diachronic evaluation of semantic changes of harm-related concepts in psychology.
no code implementations • NAACL 2019 • Ekaterina Vylomova, Ryan Cotterell, Timothy Baldwin, Trevor Cohn, Jason Eisner
Critical to natural language generation is the production of correctly inflected text.
3 code implementations • LREC 2018 • Christo Kirov, Ryan Cotterell, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sabrina J. Mielke, Arya D. McCarthy, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden
The Universal Morphology UniMorph project is a collaborative effort to improve how NLP handles complex morphology across the world's languages.
no code implementations • CONLL 2018 • Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden
Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a cloze task.
1 code implementation • EMNLP 2014 • Jing Peng, Anna Feldman, Ekaterina Vylomova
Our starting point is that words in a given text segment, such as a paragraph, that are highranking representatives of a common topic of discussion are less likely to be a part of an idiomatic expression.
no code implementations • EMNLP 2017 • Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov, David Yarowsky
The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task.
no code implementations • 26 Jul 2017 • Ekaterina Vylomova, Andrei Shcherbakov, Yuriy Philippovich, Galina Cherkasova
We present a quantitative analysis of human word association pairs and study the types of relations presented in the associations.
no code implementations • CONLL 2017 • Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden
In sub-task 2, systems were given a lemma and some of its specific inflected forms, and asked to complete the inflectional paradigm by predicting all of the remaining inflected forms.
1 code implementation • EACL 2017 • Ekaterina Vylomova, Ryan Cotterell, Timothy Baldwin, Trevor Cohn
Derivational morphology is a fundamental and complex characteristic of language.
no code implementations • WS 2017 • Ekaterina Vylomova, Trevor Cohn, Xuanli He, Gholamreza Haffari
Dealing with the complex word forms in morphologically rich languages is an open problem in language processing, and is particularly important in translation.
1 code implementation • 17 Apr 2016 • Andreas Scherbakov, Ekaterina Vylomova, Fei Liu, Timothy Baldwin
This paper describes an experimental approach to Detection of Minimal Semantic Units and their Meaning (DiMSUM), explored within the framework of SemEval 2016 Task 10.
1 code implementation • ACL 2016 • Ekaterina Vylomova, Laura Rimell, Trevor Cohn, Timothy Baldwin
Recent work on word embeddings has shown that simple vector subtraction over pre-trained embeddings is surprisingly effective at capturing different lexical relations, despite lacking explicit supervision.