no code implementations • GWC 2016 • Arefeh Kazemi, Antonio Toral, Andy Way
We propose the use of WordNet synsets in a syntax-based reordering model for hierarchical statistical machine translation (HPB-SMT) to enable the model to generalize to phrases not seen in the training data but that have equivalent meaning.
no code implementations • WMT (EMNLP) 2020 • Lukas Edman, Antonio Toral, Gertjan van Noord
This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2020 Unsupervised Machine Translation task for German–Upper Sorbian.
no code implementations • LREC (BUCC) 2022 • Rik van Noord, Cristian García-Romero, Miquel Esplà-Gomis, Leopoldo Pla Sempere, Antonio Toral
An important goal of the MaCoCu project is to improve EU-specific NLP systems that concern their Digital Service Infrastructures (DSIs).
no code implementations • EAMT 2022 • Ana Guerberof Arenas, Antonio Toral
We present here the EU-funded project CREAMT that seeks to understand what is meant by creativity in different translation modalities, e. g. machine translation, post-editing or professional translation.
no code implementations • EAMT 2022 • Marta Bañón, Miquel Esplà-Gomis, Mikel L. Forcada, Cristian García-Romero, Taja Kuzman, Nikola Ljubešić, Rik van Noord, Leopoldo Pla Sempere, Gema Ramírez-Sánchez, Peter Rupnik, Vít Suchomel, Antonio Toral, Tobias van der Werff, Jaume Zaragoza
We introduce the project “MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages”, funded by the Connecting Europe Facility, which is aimed at building monolingual and parallel corpora for under-resourced European languages.
1 code implementation • EAMT 2022 • Tobias van der Werff, Rik van Noord, Antonio Toral
We address the task of automatically distinguishing between human-translated (HT) and machine translated (MT) texts.
1 code implementation • EAMT 2020 • Lukas Edman, Antonio Toral, Gertjan van Noord
Unsupervised Machine Translation has been advancing our ability to translate without parallel data, but state-of-the-art methods assume an abundance of monolingual data.
no code implementations • LaTeCHCLfL (COLING) 2022 • Aleksandra Konovalova, Antonio Toral
Most of the work on Character Networks to date is limited to monolingual texts.
no code implementations • NAACL (ACL) 2022 • Aleksandra Konovalova, Antonio Toral, Kristiina Taivalkoski-Shilov
Character identification is a key element for many narrative-related tasks.
no code implementations • WMT (EMNLP) 2021 • Lukas Edman, Ahmet Üstün, Antonio Toral, Gertjan van Noord
This paper describes the methods behind the systems submitted by the University of Groningen for the WMT 2021 Unsupervised Machine Translation task for German–Lower Sorbian (DE–DSB): a high-resource language to a low-resource one.
no code implementations • WMT (EMNLP) 2020 • Christian Roest, Lukas Edman, Gosse Minnema, Kevin Kelly, Jennifer Spenader, Antonio Toral
Translating to and from low-resource polysynthetic languages present numerous challenges for NMT.
no code implementations • 11 Dec 2024 • Huiyuan Lai, Esther Ploeger, Rik van Noord, Antonio Toral
Neural machine translation (NMT) systems amplify lexical biases present in their training data, leading to artificially impoverished language in output translations.
no code implementations • 30 Aug 2024 • Esther Ploeger, Huiyuan Lai, Rik van Noord, Antonio Toral
Thus, rather than aiming for the rigid increase of lexical diversity, we reframe the task as recovering what is lost in the machine translation process.
no code implementations • 13 Mar 2024 • Rik van Noord, Taja Kuzman, Peter Rupnik, Nikola Ljubešić, Miquel Esplà-Gomis, Gema Ramírez-Sánchez, Antonio Toral
Large, curated, web-crawled corpora play a vital role in training language models (LMs).
no code implementations • 5 Jul 2023 • Ana Guerberof Arenas, Antonio Toral
The results show that HT presented a higher engagement, enjoyment and translation reception in Catalan if compared to PE and MT.
no code implementations • 31 May 2023 • Malina Chichirau, Rik van Noord, Antonio Toral
We tackle the task of automatically discriminating between human and machine translations.
1 code implementation • 31 May 2023 • Huiyuan Lai, Antonio Toral, Malvina Nissim
Figures of speech help people express abstract concepts and evoke stronger emotions than literal expressions, thereby making texts more creative and engaging.
no code implementations • 2 May 2023 • Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees Van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai, Chris van der Lee, Yiru Li, Saad Mahamood, Margot Mieskes, Emiel van Miltenburg, Pablo Mosteiro, Malvina Nissim, Natalie Parde, Ondřej Plátek, Verena Rieser, Jie Ruan, Joel Tetreault, Antonio Toral, Xiaojun Wan, Leo Wanner, Lewis Watson, Diyi Yang
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible.
1 code implementation • 26 Apr 2023 • Huiyuan Lai, Antonio Toral, Malvina Nissim
We investigate the potential of ChatGPT as a multidimensional evaluator for the task of \emph{Text Style Transfer}, alongside, and in comparison to, existing automatic metrics as well as human judgements.
1 code implementation • 28 Feb 2023 • Lukas Edman, Gabriele Sarti, Antonio Toral, Gertjan van Noord, Arianna Bisazza
Pretrained character-level and byte-level language models have been shown to be competitive with popular subword models across a range of Natural Language Processing (NLP) tasks.
1 code implementation • 2 Dec 2022 • Lukas Edman, Antonio Toral, Gertjan van Noord
This new downsampling method not only outperforms existing downsampling methods, showing that downsampling characters can be done without sacrificing quality, but also leads to promising performance compared to subword models for translation.
1 code implementation • 27 May 2022 • Lukas Edman, Antonio Toral, Gertjan van Noord
Character-based representations have important advantages over subword-based ones for morphologically rich languages.
1 code implementation • 24 May 2022 • Gabriele Sarti, Arianna Bisazza, Ana Guerberof Arenas, Antonio Toral
We publicly release the complete dataset including all collected behavioral data, to foster new research on the translation capabilities of NMT systems for typologically diverse languages.
no code implementations • ICON 2021 • Lukas Edman, Antonio Toral, Gertjan van Noord
This paper investigates very low resource language model pretraining, when less than 100 thousand sentences are available.
1 code implementation • HumEval (ACL) 2022 • Huiyuan Lai, Jiali Mao, Antonio Toral, Malvina Nissim
Although text style transfer has witnessed rapid development in recent years, there is as yet no established standard for evaluation, which is performed using several automatic metrics, lacking the possibility of always resorting to human judgement.
no code implementations • 12 Apr 2022 • Ana Guerberof Arenas, Antonio Toral
This article presents the results of a study involving the translation of a short story by Kurt Vonnegut from English to Catalan and Dutch using three modalities: machine-translation (MT), post-editing (PE) and translation without aid (HT).
1 code implementation • ACL 2022 • Huiyuan Lai, Antonio Toral, Malvina Nissim
We exploit the pre-trained seq2seq model mBART for multilingual text style transfer.
1 code implementation • 24 Sep 2021 • Lukas Edman, Ahmet Üstün, Antonio Toral, Gertjan van Noord
Lastly, we experiment with the order in which offline and online back-translation are used to train an unsupervised system, finding that using online back-translation first works better for DE$\rightarrow$DSB by 2. 76 BLEU.
1 code implementation • EMNLP 2021 • Huiyuan Lai, Antonio Toral, Malvina Nissim
Style transfer aims to rewrite a source text in a different target style while preserving its content.
1 code implementation • ACL 2021 • Huiyuan Lai, Antonio Toral, Malvina Nissim
Scarcity of parallel data causes formality style transfer models to have scarce success in preserving content.
1 code implementation • 15 Jan 2021 • Ana Guerberof Arenas, Antonio Toral
The results show that HT presented a higher creativity score if compared to MTPE and MT.
1 code implementation • 30 Nov 2020 • Antonio Toral, Antoni Oliver, Pau Ribas Ballestín
In this chapter we build a machine translation (MT) system tailored to the literary domain, specifically to novels, based on the state-of-the-art architecture in neural MT (NMT), the Transformer (Vaswani et al., 2017), for the translation direction English-to-Catalan.
2 code implementations • EMNLP 2020 • Rik van Noord, Antonio Toral, Johan Bos
We combine character-level and contextual language model representations to improve performance on Discourse Representation Structure parsing.
Ranked #1 on DRS Parsing on PMB-2.2.0
1 code implementation • EAMT 2020 • Yuying Ye, Antonio Toral
This research presents a fine-grained human evaluation to compare the Transformer and recurrent approaches to neural machine translation (MT), on the translation direction English-to-Chinese.
1 code implementation • EAMT 2020 • Antonio Toral
We reassess the claims of human parity and super-human performance made at the news shared task of WMT 2019 for three translation directions: English-to-German, English-to-Russian and German-to-English.
1 code implementation • 3 Apr 2020 • Samuel Läubli, Sheila Castilho, Graham Neubig, Rico Sennrich, Qinlan Shen, Antonio Toral
The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations.
no code implementations • WS 2019 • Antonio Toral, Lukas Edman, Galiya Yeshmagambetova, Jennifer Spenader
This paper presents the systems submitted by the University of Groningen to the English{--} Kazakh language pair (both translation directions) for the WMT 2019 news translation task.
1 code implementation • WS 2019 • Antonio Toral
We conduct a set of computational analyses in which we compare PE against HT on three different datasets that cover five translation directions with measures that address different translation universals and laws of translation: simplification, normalisation and interference.
1 code implementation • WS 2019 • Mike Zhang, Antonio Toral
The effect of translationese has been studied in the field of machine translation (MT), mostly with respect to training data.
no code implementations • WS 2019 • Rik van Noord, Antonio Toral, Johan Bos
Recently, sequence-to-sequence models have achieved impressive performance on a number of semantic parsing tasks.
Ranked #2 on DRS Parsing on PMB-3.0.0
1 code implementation • TACL 2018 • Rik van Noord, Lasha Abzianidze, Antonio Toral, Johan Bos
Neural methods have had several recent successes in semantic parsing, though they have yet to face the challenge of producing meaning representations based on formal semantics.
Ranked #3 on DRS Parsing on PMB-3.0.0
1 code implementation • WS 2018 • Antonio Toral, Sheila Castilho, Ke Hu, Andy Way
We reassess a recent study (Hassan et al., 2018) that claimed that machine translation (MT) has reached human parity for the translation of news from Chinese into English, using pairwise ranking and considering three variables that were not taken into account in that previous study: the language in which the source side of the test set was originally written, the translation proficiency of the evaluators, and the provision of inter-sentential context.
1 code implementation • 2 Feb 2018 • Filip Klubička, Antonio Toral, Víctor M. Sánchez-Cartagena
This paper presents a quantitative fine-grained manual evaluation approach to comparing the performance of different machine translation (MT) systems.
no code implementations • 15 Jan 2018 • Antonio Toral, Andy Way
Given the rise of a new approach to MT, Neural MT (NMT), and its promising performance on different text types, we assess the translation quality it can attain on what is perceived to be the greatest challenge for MT: literary text.
1 code implementation • 14 Jun 2017 • Filip Klubička, Antonio Toral, Víctor M. Sánchez-Cartagena
We compare three approaches to statistical machine translation (pure phrase-based, factored phrase-based and neural) by performing a fine-grained manual evaluation via error annotation of the systems' outputs.
1 code implementation • EACL 2017 • Antonio Toral, Víctor M. Sánchez-Cartagena
We aim to shed light on the strengths and weaknesses of the newly introduced neural machine translation paradigm.
no code implementations • LREC 2016 • Nikola Ljube{\v{s}}i{\'c}, Miquel Espl{\`a}-Gomis, Antonio Toral, Sergio Ortiz Rojas, Filip Klubi{\v{c}}ka
This paper presents an approach for building large monolingual corpora and, at the same time, extracting parallel data by crawling the top-level domain of a given language of interest.
no code implementations • LREC 2016 • I{\~n}aki San Vicente, I{\~n}aki Alegr{\'\i}a, Cristina Espa{\~n}a-Bonet, Pablo Gamallo, Hugo Gon{\c{c}}alo Oliveira, Eva Mart{\'\i}nez Garcia, Antonio Toral, Arkaitz Zubiaga, Nora Aranberri
We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula.
no code implementations • LREC 2016 • Meritxell Fern{\'a}ndez Barrera, Vladimir Popescu, Antonio Toral, Federico Gaspari, Khalid Choukri
This paper discusses the role that statistical machine translation (SMT) can play in the development of cross-border EU e-commerce, by highlighting extant obstacles and identifying relevant technologies to overcome them.
no code implementations • EAMT 2016 • Antonio Toral, Tommi A. Pirinen, Andy Way, Gema Ram{\'\i}rez-S{\'a}nchez, Sergio Ortiz Rojas, Raphael Rubino, Miquel Espl{\`a}, Mikel L. Forcada, Vassilis Papavassiliou, Prokopis Prokopidis, Nikola Ljube{\v{s}}i{\'c}
no code implementations • WS 2014 • Raphael Rubino, Antonio Toral, Victor M. S{\'a}nchez-Cartagena, Jorge Ferr{\'a}ndez-Tordera, Sergio Ortiz-Rojas, Gema Ram{\'\i}rez-S{\'a}nchez, Felipe S{\'a}nchez-Mart{\'\i}nez, Andy Way
no code implementations • LREC 2014 • Antonio Toral
We acquire corpora from the domain of independent news from the Tlaxcala website.
no code implementations • LREC 2014 • Nikola Ljube{\v{s}}i{\'c}, Antonio Toral
In this paper we present the construction process of a web corpus of Catalan built from the content of the . cat top-level domain.
no code implementations • LREC 2014 • Raphael Rubino, Antonio Toral, Nikola Ljube{\v{s}}i{\'c}, Gema Ram{\'\i}rez-S{\'a}nchez
This paper presents a novel approach for parallel data generation using machine translation and quality estimation.
no code implementations • LREC 2012 • Marc Poch, Antonio Toral, Olivier Hamon, Valeria Quochi, N{\'u}ria Bel
This paper presents the platform developed in the PANACEA project, a distributed factory that automates the stages involved in the acquisition, production, updating and maintenance of Language Resources required by Machine Translation and other Language Technologies.