no code implementations • ACL 2021 • Ra{\'u}l V{\'a}zquez, Hande Celikkanat, Mathias Creutz, J{\"o}rg Tiedemann
Various studies show that pretrained language models such as BERT cannot straightforwardly replace encoders in neural machine translation despite their enormous success in other tasks.
no code implementations • WS 2020 • Ra{\'u}l V{\'a}zquez, Mikko Aulamo, Umut Sulubacak, J{\"o}rg Tiedemann
This paper describes the University of Helsinki Language Technology group{'}s participation in the IWSLT 2020 offline speech translation task, addressing the translation of English audio into German text.
no code implementations • ACL 2020 • Mikko Aulamo, Sami Virpioja, J{\"o}rg Tiedemann
We demonstrate the effectiveness of OpusFilter on the example of a Finnish-English news translation task based on noisy web-crawled training data.
no code implementations • CL 2020 • Ra{\'u}l V{\'a}zquez, Aless Raganato, ro, Mathias Creutz, J{\"o}rg Tiedemann
In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks.
1 code implementation • LREC 2020 • Aless Raganato, ro, Yves Scherrer, J{\"o}rg Tiedemann
Lexical ambiguity is one of the many challenging linguistic phenomena involved in translation, i. e., translating an ambiguous word with its correct sense.
no code implementations • LREC 2020 • J{\"o}rg Tiedemann, Tommi Nieminen, Mikko Aulamo, Jenna Kanerva, Akseli Leino, Filip Ginter, Niko Papula
This paper presents FISKM{\"O}, a project that focuses on the development of resources and tools for cross-linguistic research and machine translation between Finnish and Swedish.
no code implementations • LREC 2020 • Mikko Aulamo, Umut Sulubacak, Sami Virpioja, J{\"o}rg Tiedemann
We show the use of these tools in parallel corpus creation and data diagnostics.
no code implementations • WS 2019 • Yves Scherrer, J{\"o}rg Tiedemann, Sharid Lo{\'a}iciga
In this paper, we investigate how different aspects of discourse context affect the performance of recent neural MT systems.
no code implementations • WS 2019 • Aless Raganato, ro, Ra{\'u}l V{\'a}zquez, Mathias Creutz, J{\"o}rg Tiedemann
In this paper, we explore a multilingual translation model with a cross-lingually shared layer that can be used as fixed-size sentence representation in different downstream tasks.
no code implementations • WS 2019 • Ra{\'u}l V{\'a}zquez, Umut Sulubacak, J{\"o}rg Tiedemann
This paper describes the University of Helsinki Language Technology group{'}s participation in the WMT 2019 parallel corpus filtering task.
1 code implementation • WS 2019 • Aless Raganato, ro, Yves Scherrer, J{\"o}rg Tiedemann
Supervised Neural Machine Translation (NMT) systems currently achieve impressive translation quality for many language pairs.
1 code implementation • WS 2019 • Mika H{\"a}m{\"a}l{\"a}inen, Tanja S{\"a}ily, Jack Rueter, J{\"o}rg Tiedemann, Eetu M{\"a}kel{\"a}
This paper studies the use of NMT (neural machine translation) as a normalization method for an early English letter corpus.
no code implementations • WS 2018 • Aless Raganato, ro, J{\"o}rg Tiedemann
We assess the representations of the encoder by extracting dependency relations based on self-attention weights, we perform four probing tasks to study the amount of syntactic and semantic captured information and we also test attention in a transfer learning scenario.
no code implementations • WS 2018 • Aless Raganato, ro, Yves Scherrer, Tommi Nieminen, Arvi Hurskainen, J{\"o}rg Tiedemann
This paper describes the University of Helsinki{'}s submissions to the WMT18 shared news translation task for English-Finnish and English-Estonian, in both directions.
no code implementations • WS 2018 • Emily {\"O}hman, Kaisla Kajava, J{\"o}rg Tiedemann, Timo Honkela
This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detection.
no code implementations • COLING 2018 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Dirk Speelman, Antal Van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain
We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects.
no code implementations • COLING 2018 • Mika H{\"a}m{\"a}l{\"a}inen, Tanja S{\"a}ily, Jack Rueter, J{\"o}rg Tiedemann, Eetu M{\"a}kel{\"a}
This paper presents multiple methods for normalizing the most deviant and infrequent historical spellings in a corpus consisting of personal correspondence from the 15th to the 19th century.
no code implementations • WS 2017 • Sharid Lo{\'a}iciga, Sara Stymne, Preslav Nakov, Christian Hardmeier, J{\"o}rg Tiedemann, Mauro Cettolo, Yannick Versley
We describe the design, the setup, and the evaluation results of the DiscoMT 2017 shared task on cross-lingual pronoun prediction.
no code implementations • WS 2017 • J{\"o}rg Tiedemann
This paper describes the submission from the University of Helsinki to the shared task on cross-lingual dependency parsing at VarDial 2017.
no code implementations • WS 2017 • Marcos Zampieri, Shervin Malmasi, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann, Yves Scherrer, No{\"e}mi Aepli
We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL{'}2017.
no code implementations • EACL 2017 • Robert {\"O}stling, J{\"o}rg Tiedemann
Most existing models for multilingual natural language processing (NLP) treat language as a discrete category, and make predictions for either one language or the other.
no code implementations • WS 2016 • J{\"o}rg Tiedemann, Johanna Nichols, Ronald Sprouse
This paper presents on-going work on creating NLP tools for under-resourced languages from very sparse training data coming from linguistic field work.
no code implementations • WS 2016 • Emily {\"O}hman, Timo Honkela, J{\"o}rg Tiedemann
This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of social media content.
no code implementations • WS 2016 • Shervin Malmasi, Marcos Zampieri, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann
We present the results of the third edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial{'}2016 workshop at COLING{'}2016.
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Christian Buck, Rajen Chatterjee, Christian Federmann, Liane Guillou, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Pavel Pecina, Martin Popel, Philipp Koehn, Christof Monz, Matteo Negri, Matt Post, Lucia Specia, Karin Verspoor, J{\"o}rg Tiedemann, Marco Turchi
no code implementations • LREC 2016 • Pierre Lison, J{\"o}rg Tiedemann
We present a new major release of the OpenSubtitles collection of parallel corpora.
no code implementations • LREC 2016 • J{\"o}rg Tiedemann
Our approach produces large numbers of sentence-aligned translation alternatives for over 50 languages provided via the OPUS corpus collection.
no code implementations • LREC 2014 • Liane Guillou, Christian Hardmeier, Aaron Smith, J{\"o}rg Tiedemann, Bonnie Webber
We present ParCor, a parallel corpus of texts in which pronoun coreference ― reduced coreference in which pronouns are used as referring expressions ― has been annotated.
no code implementations • LREC 2014 • Raivis Skadi{\c{n}}{\v{s}}, J{\"o}rg Tiedemann, Roberts Rozis, Daiga Deksne
The European Union is a great source of high quality documents with translations into several languages.
no code implementations • LREC 2012 • J{\"o}rg Tiedemann
In this paper, we report about new data sets and their features, additional annotation tools and models provided from the website and essential interfaces and on-line services included in the project.
no code implementations • LREC 2012 • J{\"o}rg Tiedemann, Dorte Haltrup Hansen, Lene Offersgaard, Sussi Olsen, Matthias Zumpe
In this paper, we present the architecture of a distributed resource repository developed for collecting training data for building customized statistical machine translation systems.
no code implementations • LREC 2012 • Gideon Kotz{\'e}, V, Vincent eghinste, Scott Martens, J{\"o}rg Tiedemann
We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the nonterminal constituent level for use in syntax-based machine translation.