Search Results for author: J{\"o}rg Tiedemann

Found 63 papers, 3 papers with code

On the differences between BERT and MT encoder spaces and how to address them in translation tasks

no code implementations ACL 2021 Ra{\'u}l V{\'a}zquez, Hande Celikkanat, Mathias Creutz, J{\"o}rg Tiedemann

Various studies show that pretrained language models such as BERT cannot straightforwardly replace encoders in neural machine translation despite their enormous success in other tasks.

Machine Translation Pretrained Language Models +1

OpusFilter: A Configurable Parallel Corpus Filtering Toolbox

no code implementations ACL 2020 Mikko Aulamo, Sami Virpioja, J{\"o}rg Tiedemann

We demonstrate the effectiveness of OpusFilter on the example of a Finnish-English news translation task based on noisy web-crawled training data.

Domain Adaptation Language Identification +2

The University of Helsinki Submission to the IWSLT2020 Offline SpeechTranslation Task

no code implementations WS 2020 Ra{\'u}l V{\'a}zquez, Mikko Aulamo, Umut Sulubacak, J{\"o}rg Tiedemann

This paper describes the University of Helsinki Language Technology group{'}s participation in the IWSLT 2020 offline speech translation task, addressing the translation of English audio into German text.

Transfer Learning Translation

A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation

no code implementations CL 2020 Ra{\'u}l V{\'a}zquez, Aless Raganato, ro, Mathias Creutz, J{\"o}rg Tiedemann

In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks.

Machine Translation Transfer Learning +1

An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation Systems

1 code implementation LREC 2020 Aless Raganato, ro, Yves Scherrer, J{\"o}rg Tiedemann

Lexical ambiguity is one of the many challenging linguistic phenomena involved in translation, i. e., translating an ambiguous word with its correct sense.

Machine Translation Translation +1

The FISKM\"O Project: Resources and Tools for Finnish-Swedish Machine Translation and Cross-Linguistic Research

no code implementations LREC 2020 J{\"o}rg Tiedemann, Tommi Nieminen, Mikko Aulamo, Jenna Kanerva, Akseli Leino, Filip Ginter, Niko Papula

This paper presents FISKM{\"O}, a project that focuses on the development of resources and tools for cross-linguistic research and machine translation between Finnish and Swedish.

Machine Translation Translation

Analysing concatenation approaches to document-level NMT in two different domains

no code implementations WS 2019 Yves Scherrer, J{\"o}rg Tiedemann, Sharid Lo{\'a}iciga

In this paper, we investigate how different aspects of discourse context affect the performance of recent neural MT systems.

Translation

The University of Helsinki Submission to the WMT19 Parallel Corpus Filtering Task

no code implementations WS 2019 Ra{\'u}l V{\'a}zquez, Umut Sulubacak, J{\"o}rg Tiedemann

This paper describes the University of Helsinki Language Technology group{'}s participation in the WMT 2019 parallel corpus filtering task.

General Classification

An Evaluation of Language-Agnostic Inner-Attention-Based Representations in Machine Translation

no code implementations WS 2019 Aless Raganato, ro, Ra{\'u}l V{\'a}zquez, Mathias Creutz, J{\"o}rg Tiedemann

In this paper, we explore a multilingual translation model with a cross-lingually shared layer that can be used as fixed-size sentence representation in different downstream tasks.

Machine Translation Translation

An Analysis of Encoder Representations in Transformer-Based Machine Translation

no code implementations WS 2018 Aless Raganato, ro, J{\"o}rg Tiedemann

We assess the representations of the encoder by extracting dependency relations based on self-attention weights, we perform four probing tasks to study the amount of syntactic and semantic captured information and we also test attention in a transfer learning scenario.

Feature Engineering Machine Translation +2

The University of Helsinki submissions to the WMT18 news task

no code implementations WS 2018 Aless Raganato, ro, Yves Scherrer, Tommi Nieminen, Arvi Hurskainen, J{\"o}rg Tiedemann

This paper describes the University of Helsinki{'}s submissions to the WMT18 shared news translation task for English-Finnish and English-Estonian, in both directions.

Machine Translation Translation

Normalizing Early English Letters to Present-day English Spelling

no code implementations COLING 2018 Mika H{\"a}m{\"a}l{\"a}inen, Tanja S{\"a}ily, Jack Rueter, J{\"o}rg Tiedemann, Eetu M{\"a}kel{\"a}

This paper presents multiple methods for normalizing the most deviant and infrequent historical spellings in a corpus consisting of personal correspondence from the 15th to the 19th century.

Machine Translation Translation

Findings of the VarDial Evaluation Campaign 2017

no code implementations WS 2017 Marcos Zampieri, Shervin Malmasi, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann, Yves Scherrer, No{\"e}mi Aepli

We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL{'}2017.

Dependency Parsing Dialect Identification +1

Cross-lingual dependency parsing for closely related languages - Helsinki's submission to VarDial 2017

no code implementations WS 2017 J{\"o}rg Tiedemann

This paper describes the submission from the University of Helsinki to the shared task on cross-lingual dependency parsing at VarDial 2017.

Dependency Parsing Machine Translation +2

Continuous multilinguality with language vectors

no code implementations EACL 2017 Robert {\"O}stling, J{\"o}rg Tiedemann

Most existing models for multilingual natural language processing (NLP) treat language as a discrete category, and make predictions for either one language or the other.

Image Captioning Machine Translation +2

Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task

no code implementations WS 2016 Shervin Malmasi, Marcos Zampieri, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann

We present the results of the third edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial{'}2016 workshop at COLING{'}2016.

Dialect Identification General Classification

Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work

no code implementations WS 2016 J{\"o}rg Tiedemann, Johanna Nichols, Ronald Sprouse

This paper presents on-going work on creating NLP tools for under-resourced languages from very sparse training data coming from linguistic field work.

Cross-Lingual Transfer

Finding Alternative Translations in a Large Corpus of Movie Subtitle

no code implementations LREC 2016 J{\"o}rg Tiedemann

Our approach produces large numbers of sentence-aligned translation alternatives for over 50 languages provided via the OPUS corpus collection.

Machine Translation Translation

ParCor 1.0: A Parallel Pronoun-Coreference Corpus to Support Statistical MT

no code implementations LREC 2014 Liane Guillou, Christian Hardmeier, Aaron Smith, J{\"o}rg Tiedemann, Bonnie Webber

We present ParCor, a parallel corpus of texts in which pronoun coreference ― reduced coreference in which pronouns are used as referring expressions ― has been annotated.

Machine Translation Translation

Parallel Data, Tools and Interfaces in OPUS

no code implementations LREC 2012 J{\"o}rg Tiedemann

In this paper, we report about new data sets and their features, additional annotation tools and models provided from the website and essential interfaces and on-line services included in the project.

Machine Translation Translation +1

Large aligned treebanks for syntax-based machine translation

no code implementations LREC 2012 Gideon Kotz{\'e}, V, Vincent eghinste, Scott Martens, J{\"o}rg Tiedemann

We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the nonterminal constituent level for use in syntax-based machine translation.

Language Modelling Machine Translation +1

A Distributed Resource Repository for Cloud-Based Machine Translation

no code implementations LREC 2012 J{\"o}rg Tiedemann, Dorte Haltrup Hansen, Lene Offersgaard, Sussi Olsen, Matthias Zumpe

In this paper, we present the architecture of a distributed resource repository developed for collecting training data for building customized statistical machine translation systems.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.