Search Results for author: Mathias Creutz

Found 18 papers, 3 papers with code

Morfessor-enriched features and multilingual training for canonical morphological segmentation

no code implementations • NAACL (SIGMORPHON) 2022 • Aku Rouhe, Stig-Arne Grönroos, Sami Virpioja, Mathias Creutz, Mikko Kurimo

Our approach is to pre-segment the input data for a neural sequence-to-sequence model with the unsupervised method.

Ranked #1 on Morpheme Segmentaiton on UniMorph 4.0 (f1 macro avg (subtask 2) metric)

Morpheme Segmentaiton Sentence

Paper
Add Code

Helsinki-NLP at SemEval-2022 Task 2: A Feature-Based Approach to Multilingual Idiomaticity Detection

no code implementations • SemEval (NAACL) 2022 • Sami Itkonen, Jörg Tiedemann, Mathias Creutz

This paper describes the University of Helsinki submission to the SemEval 2022 task on multilingual idiomaticity detection.

Feature Engineering Task 2

Paper
Add Code

Modeling Noise in Paraphrase Detection

no code implementations • LREC 2022 • Teemu Vahtola, Eetu Sjöblom, Jörg Tiedemann, Mathias Creutz

Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training.

Paper
Add Code

An Empirical Investigation of Word Alignment Supervision for Zero-Shot Multilingual Neural Machine Translation

no code implementations • EMNLP 2021 • Alessandro Raganato, Raúl Vázquez, Mathias Creutz, Jörg Tiedemann

In this paper, we investigate the benefits of an explicit alignment to language labels in Transformer-based MNMT models in the zero-shot context, by jointly training one cross attention head with word alignment supervision to stress the focus on the target language label.

Machine Translation Translation +1

Paper
Add Code

Coping with Noisy Training Data Labels in Paraphrase Detection

no code implementations • WNUT (ACL) 2021 • Teemu Vahtola, Mathias Creutz, Eetu Sjöblom, Sami Itkonen

We present new state-of-the-art benchmarks for paraphrase detection on all six languages in the Opusparcus sentential paraphrase corpus: English, Finnish, French, German, Russian, and Swedish.

Translation

Paper
Add Code

A Closer Look at Parameter Contributions When Training Neural Language and Translation Models

no code implementations • COLING 2022 • Raúl Vázquez, Hande Celikkanat, Vinit Ravishankar, Mathias Creutz, Jörg Tiedemann

We analyze the learning dynamics of neural language and translation models using Loss Change Allocation (LCA), an indicator that enables a fine-grained analysis of parameter updates when optimizing for the loss function.

Causal Language Modeling Language Modelling +3

Paper
Add Code

On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine Translation

1 code implementation • 14 Nov 2023 • Anssi Moisio, Mathias Creutz, Mikko Kurimo

This is a fully-automated procedure to create natural language compositionality benchmarks, making it simple and inexpensive to apply it further to other datasets and languages.

Benchmarking Machine Translation +1

Paper
Code

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

no code implementations • 22 Jun 2022 • Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims.

Benchmarking Text Generation

Paper
Add Code

Semantic Search as Extractive Paraphrase Span Detection

1 code implementation • 9 Dec 2021 • Jenna Kanerva, Hanna Kitti, Li-Hsin Chang, Teemu Vahtola, Mathias Creutz, Filip Ginter

In this paper, we approach the problem of semantic search by framing the search task as paraphrase span detection, i. e. given a segment of text as a query phrase, the task is to identify its paraphrase in a given document, the same modelling setup as typically used in extractive question answering.

Extractive Question-Answering Question Answering +5

Paper
Code

On the differences between BERT and MT encoder spaces and how to address them in translation tasks

no code implementations • ACL 2021 • Ra{\'u}l V{\'a}zquez, Hande Celikkanat, Mathias Creutz, J{\"o}rg Tiedemann

Various studies show that pretrained language models such as BERT cannot straightforwardly replace encoders in neural machine translation despite their enormous success in other tasks.

Machine Translation NMT +1

Paper
Add Code

Grammatical Error Generation Based on Translated Fragments

no code implementations • NoDaLiDa 2021 • Eetu Sjöblom, Mathias Creutz, Teemu Vahtola

We perform neural machine translation of sentence fragments in order to create large amounts of training data for English grammatical error correction.

Grammatical Error Correction Machine Translation +2

Paper
Add Code

A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation

no code implementations • CL 2020 • Ra{\'u}l V{\'a}zquez, Aless Raganato, ro, Mathias Creutz, J{\"o}rg Tiedemann

In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks.

Machine Translation Sentence +2

Paper
Add Code

Paraphrase Generation and Evaluation on Colloquial-Style Sentences

no code implementations • LREC 2020 • Eetu Sj{\"o}blom, Mathias Creutz, Yves Scherrer

We also conduct human evaluation on five of the six languages and compare the results to the automatic evaluation metrics BLEU and the recently proposed BERTScore.

Machine Translation Paraphrase Generation +2

Paper
Add Code

Toward automatic improvement of language produced by non-native language learners

no code implementations • WS 2019 • Mathias Creutz, Eetu Sj{\"o}blom

Paper
Add Code

An Evaluation of Language-Agnostic Inner-Attention-Based Representations in Machine Translation

no code implementations • WS 2019 • Aless Raganato, ro, Ra{\'u}l V{\'a}zquez, Mathias Creutz, J{\"o}rg Tiedemann

In this paper, we explore a multilingual translation model with a cross-lingually shared layer that can be used as fixed-size sentence representation in different downstream tasks.

Machine Translation Sentence +1

Paper
Add Code

Multilingual NMT with a language-independent attention bridge

1 code implementation • WS 2019 • Raúl Vázquez, Alessandro Raganato, Jörg Tiedemann, Mathias Creutz

In this paper, we propose a multilingual encoder-decoder architecture capable of obtaining multilingual sentence representations by means of incorporating an intermediate {\em attention bridge} that is shared across all languages.

NMT Sentence +2

Paper
Code

Paraphrase Detection on Noisy Subtitles in Six Languages

no code implementations • WS 2018 • Eetu Sjöblom, Mathias Creutz, Mikko Aulamo

We perform automatic paraphrase detection on subtitle data from the Opusparcus corpus comprising six European languages: German, English, Finnish, French, Russian, and Swedish.

Sentence Sentence Embedding +1

Paper
Add Code

Open Subtitles Paraphrase Corpus for Six Languages

no code implementations • LREC 2018 • Mathias Creutz

The development and test sets consist of sentence pairs that have been checked manually; each set contains approximately 1000 sentence pairs that have been verified to be acceptable paraphrases by two annotators.

Sentence

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.