1 code implementation • EMNLP 2021 • Jamshidbek Mirzakhalov, Anoop Babu, Duygu Ataman, Sherzod Kariev, Francis Tyers, Otabek Abduraufov, Mammad Hajili, Sardana Ivanova, Abror Khaytbaev, Antonio Laverghetta Jr., Bekhzodbek Moydinboyev, Esra Onal, Shaxnoza Pulatova, Ahsan Wahab, Orhan Firat, Sriram Chellappan
Recent advances in neural machine translation (NMT) have pushed the quality of machine translation systems to the point where they are becoming widely adopted to build competitive systems.
no code implementations • FieldMatters (COLING) 2022 • Sergey Kosyak, Francis Tyers
This paper presents a set of experiments in the area of morphological modelling and prediction.
no code implementations • LREC 2022 • Daniel Swanson, Francis Tyers
In this paper we present the initial construction of a Universal Dependencies treebank with morphological annotations of Ancient Hebrew containing portions of the Hebrew Scriptures (1579 sentences, 27K tokens) for use in comparative study with ancient translations and for analysis of the development of Hebrew syntax.
no code implementations • LREC 2022 • Robert Pugh, Marivel Huerta Mendez, Mitsuya Sasaki, Francis Tyers
We present a morpho-syntactically-annotated corpus of Western Sierra Puebla Nahuatl that conforms to the annotation guidelines of the Universal Dependencies project.
no code implementations • FieldMatters (COLING) 2022 • Lane Schwartz, Coleman Haley, Francis Tyers
In this paper, we present a straightforward technique for constructing interpretable word embeddings from morphologically analyzed examples (such as interlinear glosses) for all of the world’s languages.
no code implementations • NAACL (AmericasNLP) 2021 • Hyunji Park, Lane Schwartz, Francis Tyers
This paper describes the development of the first Universal Dependencies (UD) treebank for St. Lawrence Island Yupik, an endangered language spoken in the Bering Strait region.
no code implementations • UDW (COLING) 2020 • Francis Tyers, Karina Mishchenkova
This paper describes an approach to annotating noun incorporation in Universal Dependencies.
no code implementations • UDW (COLING) 2020 • He Zhou, Juyeon Chung, Sandra Kübler, Francis Tyers
We present our work of constructing the first treebank for the Xibe language following the Universal Dependencies (UD) annotation scheme.
no code implementations • NAACL (AmericasNLP) 2021 • Francis Tyers, Robert Henderson
This article describes a collection of sentences in K’iche’ annotated for morphology and syntax.
1 code implementation • NAACL (AmericasNLP) 2021 • Robert Pugh, Francis Tyers
We describe experiments with character-based language modeling for written variants of Nahuatl.
no code implementations • NAACL (AmericasNLP) 2021 • Francis Tyers, Nick Howell
We study the performance of several popular neural part-of-speech taggers from the Universal Dependencies ecosystem on Mayan languages using a small corpus of 1435 annotated K’iche’ sentences consisting of approximately 10, 000 tokens, with encouraging results: F_1 scores 93%+ on lemmatisation, part-of-speech and morphological feature assignment.
no code implementations • NAACL (AmericasNLP) 2021 • Anastasia Kuznetsova, Francis Tyers
We assess the efficacy of the approach on publicly available Wikipedia and Bible corpora and the naive coverage of analyser reaches 86% on Wikipedia and 91% on Bible corpora.
no code implementations • LT4HALA (LREC) 2022 • Daniel Swanson, Francis Tyers
However, these phenomena can be modeled fairly easily if the lexicon’s internal representation is allowed to contain more information than the pure phonological form.
1 code implementation • LREC 2022 • Sardana Ivanova, Jonathan Washington, Francis Tyers
We present, to our knowledge, the first ever published morphological analyser and generator for Sakha, a marginalised language of Siberia.
no code implementations • 17 Feb 2022 • Anastasia Kuznetsova, Anurag Kumar, Jennifer Drexler Fox, Francis Tyers
Modern end-to-end speech recognition models show astonishing results in transcribing audio signals into written text.
1 code implementation • WMT (EMNLP) 2021 • Jamshidbek Mirzakhalov, Anoop Babu, Aigiz Kunafin, Ahsan Wahab, Behzod Moydinboyev, Sardana Ivanova, Mokhiyakhon Uzokova, Shaxnoza Pulatova, Duygu Ataman, Julia Kreutzer, Francis Tyers, Orhan Firat, John Licato, Sriram Chellappan
Then, we train 26 bilingual baselines as well as a multi-way neural MT (MNMT) model using the corpus and perform an extensive analysis using automatic metrics as well as human evaluations.
no code implementations • 9 Sep 2021 • Jamshidbek Mirzakhalov, Anoop Babu, Duygu Ataman, Sherzod Kariev, Francis Tyers, Otabek Abduraufov, Mammad Hajili, Sardana Ivanova, Abror Khaytbaev, Antonio Laverghetta Jr., Behzodbek Moydinboyev, Esra Onal, Shaxnoza Pulatova, Ahsan Wahab, Orhan Firat, Sriram Chellappan
Recent advances in neural machine translation (NMT) have pushed the quality of machine translation systems to the point where they are becoming widely adopted to build competitive systems.
no code implementations • NAACL 2021 • Miikka Silfverberg, Francis Tyers, Garrett Nicolai, Mans Hulden
Sequence-to-sequence models have delivered impressive results in word formation tasks such as morphological inflection, often learning to model subtle morphophonological details with limited training data.
no code implementations • 1 Apr 2021 • Miikka Silfverberg, Francis Tyers, Garrett Nicolai, Mans Hulden
Sequence-to-sequence models have delivered impressive results in word formation tasks such as morphological inflection, often learning to model subtle morphophonological details with limited training data.
1 code implementation • WS 2020 • Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J. Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew Krizhanovsky, Tiago Pimentel, Lucas Torroba Hennigen, Christo Kirov, Garrett Nicolai, Adina Williams, Antonios Anastasopoulos, Hilaria Cruz, Eleanor Chodroff, Ryan Cotterell, Miikka Silfverberg, Mans Hulden
Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages.
no code implementations • 11 May 2020 • Lane Schwartz, Francis Tyers, Lori Levin, Christo Kirov, Patrick Littell, Chi-kiu Lo, Emily Prud'hommeaux, Hyunji Hayley Park, Kenneth Steimel, Rebecca Knowles, Jeffrey Micher, Lonny Strunk, Han Liu, Coleman Haley, Katherine J. Zhang, Robbie Jimmerson, Vasilisa Andriyanets, Aldrian Obaja Muis, Naoki Otani, Jong Hyuk Park, Zhisong Zhang
In the literature, languages like Finnish or Turkish are held up as extreme examples of complexity that challenge common modelling assumptions.
no code implementations • LREC 2020 • Nils Hjortnaes, Timofey Arkhangelskiy, Niko Partanen, Michael Rie{\ss}ler, Francis Tyers
Previous experiments showed that transfer learning using DeepSpeech can improve the accuracy of a speech recognizer for Komi, though the error rate remained very high.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • LREC 2020 • Anna Zueva, Anastasia Kuznetsova, Francis Tyers
Since a part of the corpora belongs to texts in Evenki dialects, a version of the analyser with relaxed rules is developed for processing dialectal features.
2 code implementations • LREC 2020 • Amr Keleg, Francis Tyers, Nick Howell, Tommi Pirinen
In this paper, we have developed a method for weighting a morphological analyzer built using finite state transducers in order to disambiguate its results.
no code implementations • LREC 2020 • Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, Daniel Zeman
Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework.
no code implementations • RANLP 2019 • Esra Onal, Francis Tyers
This study is an attempt to contribute to documentation and revitalization efforts of endangered Laz language, a member of South Caucasian language family mainly spoken on northeastern coastline of Turkey.
1 code implementation • WS 2019 • Jungyeul Park, Francis Tyers
In this paper we present a new annotation scheme for the Sejong part-of-speech tagged corpus based on Universal Dependencies style annotation.
no code implementations • WS 2019 • Marcos Zampieri, Shervin Malmasi, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Francis Tyers, Miikka Silfverberg, Natalia Klyueva, Tung-Le Pan, Chu-Ren Huang, Radu Tudor Ionescu, Andrei M. Butnaru, Tommi Jauhiainen
In this paper, we present the findings of the Third VarDial Evaluation Campaign organized as part of the sixth edition of the workshop on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects (VarDial), co-located with NAACL 2019.
2 code implementations • WS 2018 • Francis Tyers, Mariya Sheyanova, Aleks Martynova, ra, Pavel Stepachev, Konstantin Vinogorodskiy
This paper describes a method of creating synthetic treebanks for cross-lingual dependency parsing using a combination of machine translation (including pivot translation), annotation projection and the spanning tree algorithm.
no code implementations • COLING 2018 • Vasilisa Andriyanets, Francis Tyers
An error evaluation of 100 tokens randomly selected from the corpus, which were not covered by the analyser shows that most of the morphological processes are covered and that the majority of errors are caused by a limited stem lexicon.
no code implementations • CONLL 2017 • Daniel Zeman, Martin Popel, Milan Straka, Jan Haji{\v{c}}, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinkov{\'a}, Jan Haji{\v{c}} jr., Jaroslava Hlav{\'a}{\v{c}}ov{\'a}, V{\'a}clava Kettnerov{\'a}, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}, Jenna Kanerva, Stina Ojala, Anna Missil{\"a}, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi Kanayama, Valeria de Paiva, Kira Droganova, H{\'e}ctor Mart{\'\i}nez Alonso, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Umut Sulubacak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, M, Michael l, Jesse Kirchner, Hector Fern Alcalde, ez, Jana Strnadov{\'a}, Esha Banerjee, Ruli Manurung, Antonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendon{\c{c}}a, L, Tatiana o, Rattima Nitisaroj, Josie Li
The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.
no code implementations • CL (ACL) 2021 • Joakim Nivre, Daniel Zeman, Filip Ginter, Francis Tyers
Universal Dependencies (UD) is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages.
no code implementations • COLING 2016 • Umut Sulubacak, Memduh Gokirmak, Francis Tyers, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Joakim Nivre, G{\"u}l{\c{s}}en Eryi{\u{g}}it
The Universal Dependencies (UD) project was conceived after the substantial recent interest in unifying annotation schemes across languages.
no code implementations • LREC 2016 • Raveesh Motlani, Francis Tyers, Dipti Sharma
Morphological analysis is a fundamental task in natural-language processing, which is used in other NLP applications such as part-of-speech tagging, syntactic parsing, information retrieval, machine translation, etc.
no code implementations • LREC 2016 • Francis Tyers, Aziyana Bayyr-ool, Aelita Salchak, Jonathan Washington
{\textasciitilde}This paper describes the development of free/open-source finite-state morphological transducers for Tuvan, a Turkic language spoken in and around the Tuvan Republic in Russia.
no code implementations • LREC 2014 • Jonathan Washington, Ilnar Salimzyanov, Francis Tyers
This paper describes the development of free/open-source finite-state morphological transducers for three Turkic languages―Kazakh, Tatar, and Kumyk―representing one language from each of the three sub-branches of the Kypchak branch of Turkic.
1 code implementation • LREC 2012 • Jonathan Washington, Mirlan Ipasov, Francis Tyers
This paper describes the development of a free/open-source finite-state morphological transducer for Kyrgyz.
no code implementations • LREC 2012 • Juan Pablo Mart{\'\i}nez Cort{\'e}s, Jim O{'}Regan, Francis Tyers
The system, and the morphological analyser built for it, are both the first resources of their kind for Aragonese.