1 code implementation • WS 2019 • Raúl Vázquez, Alessandro Raganato, Jörg Tiedemann, Mathias Creutz
In this paper, we propose a multilingual encoder-decoder architecture capable of obtaining multilingual sentence representations by means of incorporating an intermediate {\em attention bridge} that is shared across all languages.
1 code implementation • 9 Dec 2021 • Jenna Kanerva, Hanna Kitti, Li-Hsin Chang, Teemu Vahtola, Mathias Creutz, Filip Ginter
In this paper, we approach the problem of semantic search by framing the search task as paraphrase span detection, i. e. given a segment of text as a query phrase, the task is to identify its paraphrase in a given document, the same modelling setup as typically used in extractive question answering.
no code implementations • LREC 2018 • Mathias Creutz
The development and test sets consist of sentence pairs that have been checked manually; each set contains approximately 1000 sentence pairs that have been verified to be acceptable paraphrases by two annotators.
no code implementations • WS 2018 • Eetu Sjöblom, Mathias Creutz, Mikko Aulamo
We perform automatic paraphrase detection on subtitle data from the Opusparcus corpus comprising six European languages: German, English, Finnish, French, Russian, and Swedish.
no code implementations • WS 2019 • Aless Raganato, ro, Ra{\'u}l V{\'a}zquez, Mathias Creutz, J{\"o}rg Tiedemann
In this paper, we explore a multilingual translation model with a cross-lingually shared layer that can be used as fixed-size sentence representation in different downstream tasks.
no code implementations • LREC 2020 • Eetu Sj{\"o}blom, Mathias Creutz, Yves Scherrer
We also conduct human evaluation on five of the six languages and compare the results to the automatic evaluation metrics BLEU and the recently proposed BERTScore.
no code implementations • CL 2020 • Ra{\'u}l V{\'a}zquez, Aless Raganato, ro, Mathias Creutz, J{\"o}rg Tiedemann
In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks.
no code implementations • NoDaLiDa 2021 • Eetu Sjöblom, Mathias Creutz, Teemu Vahtola
We perform neural machine translation of sentence fragments in order to create large amounts of training data for English grammatical error correction.
no code implementations • ACL 2021 • Ra{\'u}l V{\'a}zquez, Hande Celikkanat, Mathias Creutz, J{\"o}rg Tiedemann
Various studies show that pretrained language models such as BERT cannot straightforwardly replace encoders in neural machine translation despite their enormous success in other tasks.
no code implementations • EMNLP 2021 • Alessandro Raganato, Raúl Vázquez, Mathias Creutz, Jörg Tiedemann
In this paper, we investigate the benefits of an explicit alignment to language labels in Transformer-based MNMT models in the zero-shot context, by jointly training one cross attention head with word alignment supervision to stress the focus on the target language label.
no code implementations • WNUT (ACL) 2021 • Teemu Vahtola, Mathias Creutz, Eetu Sjöblom, Sami Itkonen
We present new state-of-the-art benchmarks for paraphrase detection on all six languages in the Opusparcus sentential paraphrase corpus: English, Finnish, French, German, Russian, and Swedish.
no code implementations • 22 Jun 2022 • Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou
This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims.
no code implementations • NAACL (SIGMORPHON) 2022 • Aku Rouhe, Stig-Arne Grönroos, Sami Virpioja, Mathias Creutz, Mikko Kurimo
Our approach is to pre-segment the input data for a neural sequence-to-sequence model with the unsupervised method.
Ranked #1 on Morpheme Segmentaiton on UniMorph 4.0 (f1 macro avg (subtask 2) metric)
no code implementations • SemEval (NAACL) 2022 • Sami Itkonen, Jörg Tiedemann, Mathias Creutz
This paper describes the University of Helsinki submission to the SemEval 2022 task on multilingual idiomaticity detection.
no code implementations • LREC 2022 • Teemu Vahtola, Eetu Sjöblom, Jörg Tiedemann, Mathias Creutz
Noisy labels in training data present a challenging issue in classification tasks, misleading a model towards incorrect decisions during training.
no code implementations • COLING 2022 • Raúl Vázquez, Hande Celikkanat, Vinit Ravishankar, Mathias Creutz, Jörg Tiedemann
We analyze the learning dynamics of neural language and translation models using Loss Change Allocation (LCA), an indicator that enables a fine-grained analysis of parameter updates when optimizing for the loss function.
1 code implementation • 14 Nov 2023 • Anssi Moisio, Mathias Creutz, Mikko Kurimo
This is a fully-automated procedure to create natural language compositionality benchmarks, making it simple and inexpensive to apply it further to other datasets and languages.