Search Results for author: J{\"o}rg Tiedemann

Found 63 papers, 3 papers with code

On the differences between BERT and MT encoder spaces and how to address them in translation tasks

no code implementations • ACL 2021 • Ra{\'u}l V{\'a}zquez, Hande Celikkanat, Mathias Creutz, J{\"o}rg Tiedemann

Various studies show that pretrained language models such as BERT cannot straightforwardly replace encoders in neural machine translation despite their enormous success in other tasks.

Machine Translation NMT +1

Paper
Add Code

OpusFilter: A Configurable Parallel Corpus Filtering Toolbox

no code implementations • ACL 2020 • Mikko Aulamo, Sami Virpioja, J{\"o}rg Tiedemann

We demonstrate the effectiveness of OpusFilter on the example of a Finnish-English news translation task based on noisy web-crawled training data.

Domain Adaptation Language Identification +2

Paper
Add Code

The University of Helsinki Submission to the IWSLT2020 Offline SpeechTranslation Task

no code implementations • WS 2020 • Ra{\'u}l V{\'a}zquez, Mikko Aulamo, Umut Sulubacak, J{\"o}rg Tiedemann

This paper describes the University of Helsinki Language Technology group{'}s participation in the IWSLT 2020 offline speech translation task, addressing the translation of English audio into German text.

Transfer Learning Translation

Paper
Add Code

A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation

no code implementations • CL 2020 • Ra{\'u}l V{\'a}zquez, Aless Raganato, ro, Mathias Creutz, J{\"o}rg Tiedemann

In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks.

Machine Translation Sentence +2

Paper
Add Code

An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation Systems

1 code implementation • LREC 2020 • Aless Raganato, ro, Yves Scherrer, J{\"o}rg Tiedemann

Lexical ambiguity is one of the many challenging linguistic phenomena involved in translation, i. e., translating an ambiguous word with its correct sense.

Machine Translation Translation +1

Paper
Code

The FISKM\"O Project: Resources and Tools for Finnish-Swedish Machine Translation and Cross-Linguistic Research

no code implementations • LREC 2020 • J{\"o}rg Tiedemann, Tommi Nieminen, Mikko Aulamo, Jenna Kanerva, Akseli Leino, Filip Ginter, Niko Papula

This paper presents FISKM{\"O}, a project that focuses on the development of resources and tools for cross-linguistic research and machine translation between Finnish and Swedish.

Machine Translation Translation

Paper
Add Code

OpusTools and Parallel Corpus Diagnostics

no code implementations • LREC 2020 • Mikko Aulamo, Umut Sulubacak, Sami Virpioja, J{\"o}rg Tiedemann

We show the use of these tools in parallel corpus creation and data diagnostics.

Language Identification

Paper
Add Code

Analysing concatenation approaches to document-level NMT in two different domains

no code implementations • WS 2019 • Yves Scherrer, J{\"o}rg Tiedemann, Sharid Lo{\'a}iciga

In this paper, we investigate how different aspects of discourse context affect the performance of recent neural MT systems.

NMT Sentence +2

Paper
Add Code

The University of Helsinki Submission to the WMT19 Parallel Corpus Filtering Task

no code implementations • WS 2019 • Ra{\'u}l V{\'a}zquez, Umut Sulubacak, J{\"o}rg Tiedemann

This paper describes the University of Helsinki Language Technology group{'}s participation in the WMT 2019 parallel corpus filtering task.

General Classification Sentence

Paper
Add Code

The MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation

1 code implementation • WS 2019 • Aless Raganato, ro, Yves Scherrer, J{\"o}rg Tiedemann

Supervised Neural Machine Translation (NMT) systems currently achieve impressive translation quality for many language pairs.

Machine Translation NMT +3

Paper
Code

An Evaluation of Language-Agnostic Inner-Attention-Based Representations in Machine Translation

no code implementations • WS 2019 • Aless Raganato, ro, Ra{\'u}l V{\'a}zquez, Mathias Creutz, J{\"o}rg Tiedemann

In this paper, we explore a multilingual translation model with a cross-lingually shared layer that can be used as fixed-size sentence representation in different downstream tasks.

Machine Translation Sentence +1

Paper
Add Code

Revisiting NMT for Normalization of Early English Letters

1 code implementation • WS 2019 • Mika H{\"a}m{\"a}l{\"a}inen, Tanja S{\"a}ily, Jack Rueter, J{\"o}rg Tiedemann, Eetu M{\"a}kel{\"a}

This paper studies the use of NMT (neural machine translation) as a normalization method for an early English letter corpus.

Lemmatization Machine Translation +2

Paper
Code

An Analysis of Encoder Representations in Transformer-Based Machine Translation

no code implementations • WS 2018 • Aless Raganato, ro, J{\"o}rg Tiedemann

We assess the representations of the encoder by extracting dependency relations based on self-attention weights, we perform four probing tasks to study the amount of syntactic and semantic captured information and we also test attention in a transfer learning scenario.

Feature Engineering Machine Translation +2

Paper
Add Code

The University of Helsinki submissions to the WMT18 news task

no code implementations • WS 2018 • Aless Raganato, ro, Yves Scherrer, Tommi Nieminen, Arvi Hurskainen, J{\"o}rg Tiedemann

This paper describes the University of Helsinki{'}s submissions to the WMT18 shared news translation task for English-Finnish and English-Estonian, in both directions.

Machine Translation Translation

Paper
Add Code

Creating a Dataset for Multilingual Fine-grained Emotion-detection Using Gamification-based Annotation

no code implementations • WS 2018 • Emily {\"O}hman, Kaisla Kajava, J{\"o}rg Tiedemann, Timo Honkela

This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detection.

Sentiment Analysis

Paper
Add Code

Normalizing Early English Letters to Present-day English Spelling

no code implementations • COLING 2018 • Mika H{\"a}m{\"a}l{\"a}inen, Tanja S{\"a}ily, Jack Rueter, J{\"o}rg Tiedemann, Eetu M{\"a}kel{\"a}

This paper presents multiple methods for normalizing the most deviant and infrequent historical spellings in a corpus consisting of personal correspondence from the 15th to the 19th century.

Machine Translation Translation

Paper
Add Code

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign

no code implementations • COLING 2018 • Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass, Yves Scherrer, Tanja Samard{\v{z}}i{\'c}, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann, Chris van der Lee, Stefan Grondelaers, Nelleke Oostdijk, Dirk Speelman, Antal Van den Bosch, Ritesh Kumar, Bornini Lahiri, Mayank Jain

We present the results and the findings of the Second VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects.

Dependency Parsing Dialect Identification

Paper
Add Code

OpenSubtitles2018: Statistical Rescoring of Sentence Alignments in Large, Noisy Parallel Corpora

no code implementations • LREC 2018 • Pierre Lison, J{\"o}rg Tiedemann, Milen Kouylekov

Machine Translation Sentence

Paper
Add Code

Rule-based Machine translation from English to Finnish

no code implementations • WS 2017 • Arvi Hurskainen, J{\"o}rg Tiedemann

Machine Translation Translation

Paper
Add Code

Findings of the 2017 DiscoMT Shared Task on Cross-lingual Pronoun Prediction

no code implementations • WS 2017 • Sharid Lo{\'a}iciga, Sara Stymne, Preslav Nakov, Christian Hardmeier, J{\"o}rg Tiedemann, Mauro Cettolo, Yannick Versley

We describe the design, the setup, and the evaluation results of the DiscoMT 2017 shared task on cross-lingual pronoun prediction.

Language Modelling Machine Translation +2

Paper
Add Code

Findings of the VarDial Evaluation Campaign 2017

no code implementations • WS 2017 • Marcos Zampieri, Shervin Malmasi, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann, Yves Scherrer, No{\"e}mi Aepli

We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of the VarDial workshop at EACL{'}2017.

Dependency Parsing Dialect Identification

Paper
Add Code

Cross-lingual dependency parsing for closely related languages - Helsinki's submission to VarDial 2017

no code implementations • WS 2017 • J{\"o}rg Tiedemann

This paper describes the submission from the University of Helsinki to the shared task on cross-lingual dependency parsing at VarDial 2017.

Dependency Parsing Machine Translation +2

Paper
Add Code

Continuous multilinguality with language vectors

no code implementations • EACL 2017 • Robert {\"O}stling, J{\"o}rg Tiedemann

Most existing models for multilingual natural language processing (NLP) treat language as a discrete category, and make predictions for either one language or the other.

Image Captioning Language Modelling +2

Paper
Add Code

Tagging Ingush - Language Technology For Low-Resource Languages Using Resources From Linguistic Field Work

no code implementations • WS 2016 • J{\"o}rg Tiedemann, Johanna Nichols, Ronald Sprouse

This paper presents on-going work on creating NLP tools for under-resourced languages from very sparse training data coming from linguistic field work.

Cross-Lingual Transfer

Paper
Add Code

Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task

no code implementations • WS 2016 • Shervin Malmasi, Marcos Zampieri, Nikola Ljube{\v{s}}i{\'c}, Preslav Nakov, Ahmed Ali, J{\"o}rg Tiedemann

We present the results of the third edition of the Discriminating between Similar Languages (DSL) shared task, which was organized as part of the VarDial{'}2016 workshop at COLING{'}2016.

Dialect Identification General Classification +1

Paper
Add Code

The Challenges of Multi-dimensional Sentiment Analysis Across Languages

no code implementations • WS 2016 • Emily {\"O}hman, Timo Honkela, J{\"o}rg Tiedemann

This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of social media content.

Sentiment Analysis Translation

Paper
Add Code

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Christian Buck, Rajen Chatterjee, Christian Federmann, Liane Guillou, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Pavel Pecina, Martin Popel, Philipp Koehn, Christof Monz, Matteo Negri, Matt Post, Lucia Specia, Karin Verspoor, J{\"o}rg Tiedemann, Marco Turchi

Machine Translation Translation

Paper
Add Code

A Linear Baseline Classifier for Cross-Lingual Pronoun Prediction

no code implementations • WS 2016 • J{\"o}rg Tiedemann

Feature Engineering Language Modelling +1

Paper
Add Code

Phrase-Based SMT for Finnish with More Data, Better Models and Alternative Alignment and Translation Tools

no code implementations • WS 2016 • J{\"o}rg Tiedemann, Fabienne Cap, Jenna Kanerva, Filip Ginter, Sara Stymne, Robert {\"O}stling, Marion Weller-Di Marco

Language Modelling Machine Translation +2

Paper
Add Code

Finding Alternative Translations in a Large Corpus of Movie Subtitle

no code implementations • LREC 2016 • J{\"o}rg Tiedemann

Our approach produces large numbers of sentence-aligned translation alternatives for over 50 languages provided via the OPUS corpus collection.

Machine Translation Sentence +1

Paper
Add Code

OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles

no code implementations • LREC 2016 • Pierre Lison, J{\"o}rg Tiedemann

We present a new major release of the OpenSubtitles collection of parallel corpora.

Optical Character Recognition (OCR)

Paper
Add Code

Part-of-Speech Driven Cross-Lingual Pronoun Prediction with Feed-Forward Neural Networks

no code implementations • WS 2015 • Jimmy Callin, Christian Hardmeier, J{\"o}rg Tiedemann

Machine Translation

Paper
Add Code

Overview of the DSL Shared Task 2015

no code implementations • WS 2015 • Marcos Zampieri, Liling Tan, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann, Preslav Nakov

Language Identification

Paper
Add Code

Pronoun-Focused MT and Cross-Lingual Pronoun Prediction: Findings of the 2015 DiscoMT Shared Task on Pronoun Translation

no code implementations • WS 2015 • Christian Hardmeier, Preslav Nakov, Sara Stymne, J{\"o}rg Tiedemann, Yannick Versley, Mauro Cettolo

Machine Translation

Paper
Add Code

Morphological Segmentation and OPUS for Finnish-English Machine Translation

no code implementations • WS 2015 • J{\"o}rg Tiedemann, Filip Ginter, Jenna Kanerva

Language Modelling Machine Translation +2

Paper
Add Code

Baseline Models for Pronoun Prediction and Pronoun-Aware Translation

no code implementations • WS 2015 • J{\"o}rg Tiedemann

Language Modelling Machine Translation +2

Paper
Add Code

Cross-Lingual Dependency Parsing with Universal Dependencies and Predicted PoS Labels

no code implementations • WS 2015 • J{\"o}rg Tiedemann

Dependency Parsing Machine Translation +3

Paper
Add Code

Boosting English-Chinese Machine Transliteration via High Quality Alignment and Multilingual Resources

no code implementations • WS 2015 • Yan Shao, J{\"o}rg Tiedemann, Joakim Nivre

Information Retrieval Transliteration +1

Paper
Add Code

Improving the Cross-Lingual Projection of Syntactic Dependencies

no code implementations • WS 2015 • J{\"o}rg Tiedemann

Dependency Parsing Word Alignment

Paper
Add Code

Word's Vector Representations meet Machine Translation

no code implementations • WS 2014 • Eva Mart{\'\i}nez Garcia, J{\"o}rg Tiedemann, Cristina Espa{\~n}a-Bonet, Llu{\'\i}s M{\`a}rquez

Machine Translation Translation

Paper
Add Code

Cross-lingual Dependency Parsing of Related Languages with Rich Morphosyntactic Tagsets

no code implementations • WS 2014 • {\v{Z}}eljko Agi{\'c}, J{\"o}rg Tiedemann, Danijela Merkler, Simon Krek, Kaja Dobrovoljc, Sara Mo{\v{z}}e

Dependency Parsing Machine Translation +1

Paper
Add Code

A Report on the DSL Shared Task 2014

no code implementations • WS 2014 • Marcos Zampieri, Liling Tan, Nikola Ljube{\v{s}}i{\'c}, J{\"o}rg Tiedemann

Language Identification

Paper
Add Code

Rediscovering Annotation Projection for Cross-Lingual Parser Induction

no code implementations • COLING 2014 • J{\"o}rg Tiedemann

Word Alignment

Paper
Add Code

Treebank Translation for Cross-Lingual Parser Induction

no code implementations • WS 2014 • J{\"o}rg Tiedemann, {\v{Z}}eljko Agi{\'c}, Joakim Nivre

Dependency Parsing Machine Translation +2

Paper
Add Code

Estimating Word Alignment Quality for SMT Reordering Tasks

no code implementations • WS 2014 • Sara Stymne, J{\"o}rg Tiedemann, Joakim Nivre

Machine Translation Part-Of-Speech Tagging +1

Paper
Add Code

Anaphora Models and Reordering for Phrase-Based SMT

no code implementations • WS 2014 • Christian Hardmeier, Sara Stymne, J{\"o}rg Tiedemann, Aaron Smith, Joakim Nivre

Language Modelling Machine Translation

Paper
Add Code

Billions of Parallel Words for Free: Building and Using the EU Bookshop Corpus

no code implementations • LREC 2014 • Raivis Skadi{\c{n}}{\v{s}}, J{\"o}rg Tiedemann, Roberts Rozis, Daiga Deksne

The European Union is a great source of high quality documents with translations into several languages.

Machine Translation Translation

Paper
Add Code

ParCor 1.0: A Parallel Pronoun-Coreference Corpus to Support Statistical MT

no code implementations • LREC 2014 • Liane Guillou, Christian Hardmeier, Aaron Smith, J{\"o}rg Tiedemann, Bonnie Webber

We present ParCor, a parallel corpus of texts in which pronoun coreference ― reduced coreference in which pronouns are used as referring expressions ― has been annotated.

Machine Translation Translation

Paper
Add Code

Latent Anaphora Resolution for Cross-Lingual Pronoun Prediction

no code implementations • EMNLP 2013 • Christian Hardmeier, J{\"o}rg Tiedemann, Joakim Nivre

Machine Translation

Paper
Add Code

Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

no code implementations • ACL 2013 • Christian Hardmeier, Sara Stymne, J{\"o}rg Tiedemann, Joakim Nivre

Language Modelling Machine Translation +1

Paper
Add Code

Tunable Distortion Limits and Corpus Cleaning for SMT

no code implementations • WS 2013 • Sara Stymne, Christian Hardmeier, J{\"o}rg Tiedemann, Joakim Nivre

Language Modelling Machine Translation

Paper
Add Code

Feature Weight Optimization for Discourse-Level SMT

no code implementations • WS 2013 • Sara Stymne, Christian Hardmeier, J{\"o}rg Tiedemann, Joakim Nivre

Machine Translation

Paper
Add Code

Statistical Machine Translation with Readability Constraints

no code implementations • WS 2013 • Sara Stymne, J{\"o}rg Tiedemann, Christian Hardmeier, Joakim Nivre

Machine Translation Text Simplification +1

Paper
Add Code

Experiences in Building the Let's MT! Portal on Amazon EC2

no code implementations • WS 2013 • J{\"o}rg Tiedemann

Machine Translation

Paper
Add Code

Efficient Discrimination Between Closely Related Languages

no code implementations • COLING 2012 • J{\"o}rg Tiedemann, Nikola Ljube{\v{s}}i{\'c}

Document Classification Language Identification

Paper
Add Code

Document-Wide Decoding for Phrase-Based Statistical Machine Translation

no code implementations • EMNLP 2012 • Christian Hardmeier, Joakim Nivre, J{\"o}rg Tiedemann

Language Modelling Machine Translation +1

Paper
Add Code

Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages

no code implementations • ACL 2012 • Preslav Nakov, J{\"o}rg Tiedemann

Language Modelling Translation +2

Paper
Add Code

LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation

no code implementations • ACL 2012 • Andrejs Vasi{\c{l}}jevs, Raivis Skadi{\c{n}}{\v{s}}, J{\"o}rg Tiedemann

Machine Translation Translation

Paper
Add Code

Tree Kernels for Machine Translation Quality Estimation

no code implementations • WS 2012 • Christian Hardmeier, Joakim Nivre, J{\"o}rg Tiedemann

Machine Translation Translation

Paper
Add Code

A Distributed Resource Repository for Cloud-Based Machine Translation

no code implementations • LREC 2012 • J{\"o}rg Tiedemann, Dorte Haltrup Hansen, Lene Offersgaard, Sussi Olsen, Matthias Zumpe

In this paper, we present the architecture of a distributed resource repository developed for collecting training data for building customized statistical machine translation systems.

Machine Translation Management +2

Paper
Add Code

Parallel Data, Tools and Interfaces in OPUS

no code implementations • LREC 2012 • J{\"o}rg Tiedemann

In this paper, we report about new data sets and their features, additional annotation tools and models provided from the website and essential interfaces and on-line services included in the project.

Machine Translation Translation +1

Paper
Add Code

Large aligned treebanks for syntax-based machine translation

no code implementations • LREC 2012 • Gideon Kotz{\'e}, V, Vincent eghinste, Scott Martens, J{\"o}rg Tiedemann

We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the nonterminal constituent level for use in syntax-based machine translation.

Language Modelling Machine Translation +1

Paper
Add Code

Character-Based Pivot Translation for Under-Resourced Languages and Domains

no code implementations • EACL 2012 • J{\"o}rg Tiedemann

Domain Adaptation Machine Translation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.