no code implementations • Findings (EMNLP) 2021 • Martin Popel, Zdeněk Žabokrtský, Anna Nedoluzhko, Michal Novák, Daniel Zeman
One can find dozens of data resources for various languages in which coreference - a relation between two or more expressions that refer to the same real-world entity - is manually annotated.
no code implementations • ACL (IWPT) 2021 • Gosse Bouma, Djamé Seddah, Daniel Zeman
We describe the second IWPT task on end-to-end parsing from raw text to Enhanced Universal Dependencies.
no code implementations • LREC 2022 • Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Amir Zeldes, Daniel Zeman
Recent advances in standardization for annotated language resources have led to successful large scale efforts, such as the Universal Dependencies (UD) project for multilingual syntactically annotated data.
no code implementations • CL (ACL) 2020 • Zdeněk Žabokrtský, Daniel Zeman, Magda Ševčíková
This article gives an overview of how sentence meaning is represented in eleven deep-syntactic frameworks, ranging from those based on linguistic theories elaborated for decades to rather lightweight NLP-motivated approaches.
no code implementations • UDW (COLING) 2020 • Marsida Toska, Joakim Nivre, Daniel Zeman
In this paper, we introduce the first Universal Dependencies (UD) treebank for standard Albanian, consisting of 60 sentences collected from the Albanian Wikipedia, annotated with lemmas, universal part-of-speech tags, morphological features and syntactic dependencies.
no code implementations • 21 Mar 2023 • Kira Droganova, Daniel Zeman
This paper analyzes multiple deep-syntactic frameworks with the goal of creating a proposal for a set of universal semantic role labels.
1 code implementation • CRAC (ACL) 2022 • Zdeněk Žabokrtský, Miloslav Konopík, Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk, Martin Popel, Ondřej Pražák, Jakub Sido, Daniel Zeman, YIlun Zhu
The public edition of CorefUD 1. 0, which contains 13 datasets for 10 languages, was used as the source of training and evaluation data.
no code implementations • CONLL 2020 • Stephan Oepen, Omri Abend, Lasha Abzianidze, Johan Bos, Jan Hajic, Daniel Hershcovich, Bin Li, Tim O{'}Gorman, Nianwen Xue, Daniel Zeman
Extending a similar setup from the previous year, five distinct approaches to the representation of sentence meaning in the form of directed graphs were represented in the English training and evaluation data for the task, packaged in a uniform graph abstraction and serialization; for four of these representation frameworks, additional training and evaluation data was provided for one additional language per framework.
no code implementations • CONLL 2020 • Daniel Zeman, Jan Hajic
Prague Tectogrammatical Graphs (PTG) is a meaning representation framework that originates in the tectogrammatical layer of the Prague Dependency Treebank (PDT) and is theoretically founded in Functional Generative Description of language (FGD).
1 code implementation • EMNLP (SIGTYP) 2020 • Martin Vastl, Daniel Zeman, Rudolf Rosa
We present our submission to the SIGTYP 2020 Shared Task on the prediction of typological features.
no code implementations • WS 2020 • Gosse Bouma, Djam{\'e} Seddah, Daniel Zeman
This overview introduces the task of parsing into enhanced universal dependencies, describes the datasets used for training and evaluation, and evaluation metrics.
no code implementations • LREC 2020 • Ol{\'a}j{\'\i}d{\'e} Ishola, Daniel Zeman
Low-resource languages present enormous NLP opportunities as well as varying degrees of difficulties.
no code implementations • LREC 2020 • Atul Kr. Ojha, Daniel Zeman
This paper presents the first dependency treebank for Bhojpuri, a resource-poor language that belongs to the Indo-Aryan language family.
no code implementations • LREC 2020 • Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, Daniel Zeman
Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework.
no code implementations • CONLL 2019 • Kira Droganova, Andrey Kutuzov, Nikita Mediankin, Daniel Zeman
This paper describes the {\'U}FAL--Oslo system submission to the shared task on Cross-Framework Meaning Representation Parsing (MRP, Oepen et al. 2019).
no code implementations • WS 2019 • Ronald Cardenas, Claudia Borg, Daniel Zeman
This paper presents the submission by the Charles University-University of Malta team to the SIGMORPHON 2019 Shared Task on Morphological Analysis and Lemmatization in context.
no code implementations • WS 2018 • Kira Droganova, Filip Ginter, Jenna Kanerva, Daniel Zeman
In this paper, we focus on parsing rare and non-trivial constructions, in particular ellipsis.
no code implementations • WS 2018 • Flavio Massimiliano Cecchini, Marco Passarotti, Paola Marongiu, Daniel Zeman
The changes are made both to harmonise the Universal Dependencies version of the \textit{Index Thomisticus} Treebank with the two other available Latin treebanks and to fix errors and inconsistencies resulting from the original process.
no code implementations • WS 2018 • Ronald Cardenas, Daniel Zeman
We present a fairly complete morphological analyzer for Shipibo-Konibo, a low-resourced native language spoken in the Amazonian region of Peru.
no code implementations • CONLL 2018 • Daniel Zeman, Jan Haji{\v{c}}, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, Slav Petrov
Every year, the Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.
no code implementations • CONLL 2017 • Daniel Zeman, Martin Popel, Milan Straka, Jan Haji{\v{c}}, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinkov{\'a}, Jan Haji{\v{c}} jr., Jaroslava Hlav{\'a}{\v{c}}ov{\'a}, V{\'a}clava Kettnerov{\'a}, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}, Jenna Kanerva, Stina Ojala, Anna Missil{\"a}, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi Kanayama, Valeria de Paiva, Kira Droganova, H{\'e}ctor Mart{\'\i}nez Alonso, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Umut Sulubacak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, M, Michael l, Jesse Kirchner, Hector Fern Alcalde, ez, Jana Strnadov{\'a}, Esha Banerjee, Ruli Manurung, Antonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendon{\c{c}}a, L, Tatiana o, Rattima Nitisaroj, Josie Li
The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.
no code implementations • WS 2017 • Rudolf Rosa, Daniel Zeman, David Mare{\v{c}}ek, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}
We once had a corp, or should we say, it once had us They showed us its tags, isn{'}t it great, unified tags They asked us to parse and they told us to use everything So we looked around and we noticed there was near nothing We took other langs, bitext aligned: words one-to-one We played for two weeks, and then they said, here is the test The parser kept training till morning, just until deadline So we had to wait and hope what we get would be just fine And, when we awoke, the results were done, we saw we{'}d won So, we wrote this paper, isn{'}t it good, Norwegian wood.
no code implementations • WS 2017 • Dima Taji, Nizar Habash, Daniel Zeman
We describe the process of creating NUDAR, a Universal Dependency treebank for Arabic.
no code implementations • CL (ACL) 2021 • Joakim Nivre, Daniel Zeman, Filip Ginter, Francis Tyers
Universal Dependencies (UD) is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages.
no code implementations • LREC 2016 • Stephan Oepen, Marco Kuhlmann, Yusuke Miyao, Daniel Zeman, Silvie Cinkov{\'a}, Dan Flickinger, Jan Haji{\v{c}}, Angelina Ivanova, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}
We announce a new language resource for research on semantic parsing, a large, carefully curated collection of semantic dependency graphs representing multiple linguistic traditions.
no code implementations • LREC 2016 • Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Haji{\v{c}}, Christopher D. Manning, Ryan Mcdonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, Daniel Zeman
Cross-linguistically consistent annotation is necessary for sound comparative evaluation and cross-lingual learning experiments.
no code implementations • LREC 2016 • Zhiwei Yu, David Mare{\v{c}}ek, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Daniel Zeman
Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP.
no code implementations • LREC 2014 • Ond{\v{r}}ej Bojar, Vojt{\v{e}}ch Diatka, Pavel Rychl{\'y}, Pavel Stra{\v{n}}{\'a}k, V{\'\i}t Suchomel, Ale{\v{s}} Tamchyna, Daniel Zeman
HindEnCorp consists of 274k parallel sentences (3. 9 million Hindi and 3. 8 million English tokens).
no code implementations • LREC 2014 • Rudolf Rosa, Jan Ma{\v{s}}ek, David Mare{\v{c}}ek, Martin Popel, Daniel Zeman, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}
We present HamleDT 2. 0 (HArmonized Multi-LanguagE Dependency Treebank).
no code implementations • LREC 2012 • Daniel Zeman, David Mare{\v{c}}ek, Martin Popel, Loganathan Ramasamy, Jan {\v{S}}t{\v{e}}p{\'a}nek, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Jan Haji{\v{c}}
We propose HamleDT ― HArmonized Multi-LanguagE Dependency Treebank.
no code implementations • LREC 2012 • Jan Berka, Ond{\v{r}}ej Bojar, Mark Fishel, Maja Popovi{\'c}, Daniel Zeman
We present a complex, open source tool for detailed machine translation error analysis providing the user with automatic error detection and classification, several monolingual alignment algorithms as well as with training and test corpus browsing.