1 code implementation • 8 Nov 2019 • Jindřich Libovický, Rudolf Rosa, Alexander Fraser
Multilingual BERT (mBERT) provides sentence representations for 104 languages, which are useful for many multi-lingual tasks.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Jindřich Libovický, Rudolf Rosa, Alexander Fraser
Multilingual contextual embeddings, such as multilingual BERT and XLM-RoBERTa, have proved useful for many multi-lingual tasks.
1 code implementation • 22 Aug 2019 • Rudolf Rosa, Zdeněk Žabokrtský
We focus on the task of unsupervised lemmatization, i. e. grouping together inflected forms of one word under one label (a lemma) without the use of annotated training data.
1 code implementation • EMNLP (SIGTYP) 2020 • Martin Vastl, Daniel Zeman, Rudolf Rosa
We present our submission to the SIGTYP 2020 Shared Task on the prediction of typological features.
no code implementations • 16 Jun 2015 • Rudolf Rosa
We present our work on semi-supervised parsing of natural language sentences, focusing on multi-source crosslingual transfer of delexicalized dependency parsers.
no code implementations • CONLL 2018 • Rudolf Rosa, David Mare{\v{c}}ek
This is a system description paper for the CUNI x-ling submission to the CoNLL 2018 UD Shared Task.
no code implementations • WS 2018 • David Mare{\v{c}}ek, Rudolf Rosa
This is a work in progress about extracting the sentence tree structures from the encoder{'}s self-attention weights, when translating into another language using the Transformer neural network architecture.
no code implementations • WS 2017 • Rudolf Rosa, Daniel Zeman, David Mare{\v{c}}ek, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}
We once had a corp, or should we say, it once had us They showed us its tags, isn{'}t it great, unified tags They asked us to parse and they told us to use everything So we looked around and we noticed there was near nothing We took other langs, bitext aligned: words one-to-one We played for two weeks, and then they said, here is the test The parser kept training till morning, just until deadline So we had to wait and hope what we get would be just fine And, when we awoke, the results were done, we saw we{'}d won So, we wrote this paper, isn{'}t it good, Norwegian wood.
no code implementations • WS 2017 • Antonio Jimeno Yepes, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Karin Verspoor, Ond{\v{r}}ej Bojar, Arthur Boyer, Cristian Grozea, Barry Haddow, Madeleine Kittner, Yvonne Lichtblau, Pavel Pecina, Rol Roller, , Rudolf Rosa, Amy Siu, Philippe Thomas, Saskia Trescher
no code implementations • LREC 2014 • Rudolf Rosa, Jan Ma{\v{s}}ek, David Mare{\v{c}}ek, Martin Popel, Daniel Zeman, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}
We present HamleDT 2. 0 (HArmonized Multi-LanguagE Dependency Treebank).
no code implementations • LREC 2014 • Petra Baran{\v{c}}{\'\i}kov{\'a}, Rudolf Rosa, Ale{\v{s}} Tamchyna
Grammatical correctness of the new reference sentence is provided by applying Depfix on newly created paraphrases.
no code implementations • WS 2019 • David Mareček, Rudolf Rosa
We inspect the multi-head self-attention in Transformer NMT encoders for three source languages, looking for patterns that could have a syntactic interpretation.
no code implementations • 27 Jun 2019 • Rudolf Rosa, David Mareček
We use the English model of BERT and explore how a deletion of one word in a sentence changes representations of other words.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Tomasz Limisiewicz, Rudolf Rosa, David Mareček
This work focuses on analyzing the form and extent of syntactic abstraction captured by BERT by extracting labeled dependency trees from self-attentions.
no code implementations • 25 Jun 2020 • Rudolf Rosa, Ondřej Dušek, Tom Kocmi, David Mareček, Tomáš Musil, Patrícia Schmidtová, Dominik Jurko, Ondřej Bojar, Daniel Hrbek, David Košťák, Martina Kinská, Josef Doležal, Klára Vosecká
We present THEaiTRE, a starting project aimed at automatic generation of theatre play scripts.
no code implementations • 29 Jun 2020 • Rudolf Rosa, Tomáš Musil, David Mareček
In classical probing, a classifier is trained on the representations to extract the target linguistic information.
no code implementations • 17 Feb 2021 • Rudolf Rosa, Tomáš Musil, Ondřej Dušek, Dominik Jurko, Patrícia Schmidtová, David Mareček, Ondřej Bojar, Tom Kocmi, Daniel Hrbek, David Košťák, Martina Kinská, Marie Nováková, Josef Doležal, Klára Vosecká, Tomáš Studeník, Petr Žabka
We present the first version of a system for interactive generation of theatre play scripts.
no code implementations • LANTERN (COLING) 2020 • Abhishek Agrawal, Rudolf Rosa
We also augment a graph-based parser with eye-tracking features and parse the Dundee Corpus to corroborate our findings from the sequence labelling parser.
no code implementations • 16 Jun 2022 • Patrícia Schmidtová, Dávid Javorský, Christián Mikláš, Tomáš Musil, Rudolf Rosa, Ondřej Dušek
We present a novel approach to generating scripts by using agents with different personality types.
no code implementations • NAACL (WNU) 2022 • Rudolf Rosa, Patrícia Schmidtová, Ondřej Dušek, Tomáš Musil, David Mareček, Saad Obaid, Marie Nováková, Klára Vosecká, Josef Doležal
We experiment with adapting generative language models for the generation of long coherent narratives in the form of theatre plays.
no code implementations • COLING (CreativeSumm) 2022 • Rishu Kumar, Rudolf Rosa
This system description paper details TEAM UFAL’s approach for the SummScreen, TVMegasite subtask of the CreativeSumm shared task.
no code implementations • 13 Apr 2024 • Tomáš Sourada, Jana Straková, Rudolf Rosa
For testing in OOV conditions, we automatically extracted a large dataset of nouns in the morphologically rich Czech language, with lemma-disjoint data splits, and we further manually annotated a real-world OOV dataset of neologisms.