1 code implementation • EMNLP 2021 • Raksha Shenoy, Nico Herbig, Antonio Krüger, Josef van Genabith
For helpful quality levels, a visualization reflecting the uncertainty of the QE model is preferred.
no code implementations • ICON 2020 • Loitongbam Sanayai Meetei, Thoudam Doren Singh, Sivaji Bandyopadhyay, Mihaela Vela, Josef van Genabith
A Computer Assisted Translation (CAT) tool is used to record the time, keystroke and other indicators to measure PE effort in terms of temporal and technical effort.
no code implementations • EMNLP 2020 • Jingyi Zhang, Josef van Genabith
In order to make use of different types of human evaluation data for supervised learning, we present a multi-task learning QE model that jointly learns two tasks: score a translation and rank two translations.
1 code implementation • EMNLP (ACL) 2021 • Jörg Steffen, Josef van Genabith
This is challenging, as markup can be nested, apply to spans contiguous in source but non-contiguous in target etc.
no code implementations • IWSLT 2017 • Cristina España-Bonet, Josef van Genabith
This paper describes the UdS-DFKI participation to the multilingual task of the IWSLT Evaluation 2017.
no code implementations • WMT (EMNLP) 2020 • Sourav Dutta, Jesujoba Alabi, Saptarashmi Bandyopadhyay, Dana Ruiter, Josef van Genabith
This paper describes the UdS-DFKI submission to the shared task for unsupervised machine translation (MT) and very low-resource supervised MT between German (de) and Upper Sorbian (hsb) at the Fifth Conference of Machine Translation (WMT20).
no code implementations • RANLP 2021 • Koel Dutta Chowdhury, Cristina España-Bonet, Josef van Genabith
Previous research has used linguistic features to show that translations exhibit traces of source language interference and that phylogenetic trees between languages can be reconstructed from the results of translations into the same language.
no code implementations • 25 Aug 2023 • Angana Borah, Daria Pylypenko, Cristina Espana-Bonet, Josef van Genabith
Translationese signals are subtle (especially for professional translation) and compete with many other signals in the data such as genre, style, author, and, in particular, topic.
no code implementations • 6 Jun 2023 • Sangeet Sagar, Mirco Ravanelli, Bernd Kiefer, Ivana Kruijff Korbayova, Josef van Genabith
Despite the recent advancements in speech recognition, there are still difficulties in accurately transcribing conversational and emotional speech in noisy and reverberant acoustic environments.
1 code implementation • 23 May 2023 • Niyati Bafna, Cristina España-Bonet, Josef van Genabith, Benoît Sagot, Rachel Bawden
Existing approaches for unsupervised bilingual lexicon induction (BLI) often depend on good quality static or contextual embeddings trained on large monolingual corpora for both languages.
1 code implementation • 28 Apr 2023 • Sonal Sannigrahi, Josef van Genabith, Cristina Espana-Bonet
We demonstrate that while a simple sentence average results in a strong baseline for classification tasks, more complex combinations are necessary for semantic tasks.
1 code implementation • 20 Apr 2023 • Yusser Al Ghussin, Jingyi Zhang, Josef van Genabith
We show that document-level NMT models trained with only parallel paragraphs from Paracrawl can be used to translate real documents from TED, News and Europarl, outperforming sentence-level NMT models.
no code implementations • 7 Nov 2022 • Tengxun Zhang, Hongfei Xu, Josef van Genabith, Deyi Xiong, Hongying Zan
Hybrid tabular-textual question answering (QA) requires reasoning from heterogeneous information, and the types of reasoning are mainly divided into numerical reasoning and span extraction.
no code implementations • 24 Oct 2022 • Kwabena Amponsah-Kaakyire, Daria Pylypenko, Josef van Genabith, Cristina España-Bonet
Previous research did not show $(i)$ whether the difference is because of the features, the classifiers or both, and $(ii)$ what the neural classifiers actually learn.
1 code implementation • NAACL (SocialNLP) 2022 • Dana Ruiter, Thomas Kleinbauer, Cristina España-Bonet, Josef van Genabith, Dietrich Klakow
Recent research on style transfer takes inspiration from unsupervised neural machine translation (UNMT), learning from large amounts of non-parallel data by exploiting cycle consistency loss, back-translation, and denoising autoencoders.
1 code implementation • NAACL 2022 • Koel Dutta Chowdhury, Rricha Jalota, Cristina España-Bonet, Josef van Genabith
Cross-lingual natural language processing relies on translation, either by humans or machines, at different levels, from translating training data to translating test sets.
no code implementations • EMNLP 2021 • Daria Pylypenko, Kwabena Amponsah-Kaakyire, Koel Dutta Chowdhury, Josef van Genabith, Cristina España-Bonet
Traditional hand-crafted linguistically-informed features have often been used for distinguishing between translated and original non-translated texts.
1 code implementation • ACL 2021 • Rashad Albo Jamara, Nico Herbig, Antonio Kr{\"u}ger, Josef van Genabith
Here, we present the first study that investigates the usefulness of mid-air hand gestures in combination with the keyboard (GK) for text editing in PE of MT.
no code implementations • ACL 2021 • Hongfei Xu, Qiuhui Liu, Josef van Genabith, Deyi Xiong, Meng Zhang
This has to be computed n times for a sequence of length n. The linear transformations involved in the LSTM gate and state computations are the major cost factors in this.
no code implementations • ACL 2021 • Hongfei Xu, Qiuhui Liu, Josef van Genabith, Deyi Xiong
In this paper, we propose to efficiently increase the capacity for multilingual NMT by increasing the cardinality.
no code implementations • ACL 2021 • Jingyi Zhang, Josef van Genabith
We further fine-tune the target-to-source attention in the BTBA model to obtain better alignments using a full context based optimization method and self-supervised training.
no code implementations • MTSummit 2021 • Dana Ruiter, Dietrich Klakow, Josef van Genabith, Cristina España-Bonet
For most language combinations, parallel data is either scarce or simply unavailable.
no code implementations • COLING 2020 • Koel Dutta Chowdhury, Cristina Espa{\~n}a-Bonet, Josef van Genabith
Recent studies use a combination of lexical and syntactic features to show that footprints of the source language remain visible in translations, to the extent that it is possible to predict the original source language from the translation.
no code implementations • Findings (EMNLP) 2021 • Hongfei Xu, Qiuhui Liu, Josef van Genabith, Deyi Xiong
The Transformer translation model is based on the multi-head attention mechanism, which can be parallelized easily.
no code implementations • 4 Sep 2020 • Eleni Metheniti, Guenter Neumann, Josef van Genabith
Inflection is an essential part of every human language's morphology, yet little effort has been made to unify linguistic theory and computational methods in recent years.
no code implementations • 13 Jul 2020 • Hongfei Xu, Qiuhui Liu, Deyi Xiong, Josef van Genabith
In this paper, we suggest that the residual connection has its drawbacks, and propose to train Transformers with the depth-wise LSTM which regards outputs of layers as steps in time series instead of residual connections, under the motivation that the vanishing gradient problem suffered by deep networks is the same as recurrent networks applied to long sequences, while LSTM (Hochreiter and Schmidhuber, 1997) has been proven of good capability in capturing long-distance relationship, and its design may alleviate some drawbacks of residual connections while ensuring the convergence.
no code implementations • ACL 2020 • Nico Herbig, Santanu Pal, Tim D{\"u}wel, Kalliopi Meladaki, Mahsa Monshizadeh, Vladislav Hnatovskiy, Antonio Kr{\"u}ger, Josef van Genabith
The shift from traditional translation to post-editing (PE) of machine-translated (MT) text can save time and reduce errors, but it also affects the design of translation interfaces, as the task changes from mainly generating text to correcting errors within otherwise helpful translation proposals.
no code implementations • WS 2020 • Yuri Bizzoni, Tom S Juzek, Cristina Espa{\~n}a-Bonet, Koel Dutta Chowdhury, Josef van Genabith, Elke Teich
Some translationese features tend to appear in simultaneous interpreting with higher frequency than in human text translation, but the reasons for this are unclear.
no code implementations • ACL 2020 • Nico Herbig, Tim D{\"u}wel, Santanu Pal, Kalliopi Meladaki, Mahsa Monshizadeh, Antonio Kr{\"u}ger, Josef van Genabith
On the other hand, speech and multi-modal combinations of select {\&} speech are considered suitable for replacements and insertions but offer less potential for deletion and reordering.
no code implementations • ACL 2020 • Hongfei Xu, Josef van Genabith, Deyi Xiong, Qiuhui Liu, Jingyi Zhang
Considering that modeling phrases instead of words has significantly improved the Statistical Machine Translation (SMT) approach through the use of larger translation blocks ("phrases") and its reordering ability, modeling NMT at phrase level is an intuitive proposal to help the model capture long-distance relationships.
no code implementations • ACL 2020 • Hongfei Xu, Josef van Genabith, Deyi Xiong, Qiuhui Liu
We propose to automatically and dynamically determine batch sizes by accumulating gradients of mini-batches and performing an optimization step at just the time when the direction of gradients starts to fluctuate.
no code implementations • LREC 2020 • Lilli Smal, Andrea L{\"o}sch, Josef van Genabith, Maria Giagkou, Thierry Declerck, Stephan Busemann
Data is key in training modern language technologies.
no code implementations • EMNLP 2020 • Dana Ruiter, Josef van Genabith, Cristina España-Bonet
Self-supervised neural machine translation (SSNMT) jointly learns to identify and select suitable training data from comparable (rather than parallel) corpora and to translate, in a way that the two tasks support each other in a virtuous circle.
no code implementations • LREC 2020 • Georg Rehm, Katrin Marheinecke, Stefanie Hegele, Stelios Piperidis, Kalina Bontcheva, Jan Hajič, Khalid Choukri, Andrejs Vasiļjevs, Gerhard Backfried, Christoph Prinz, José Manuel Gómez Pérez, Luc Meertens, Paul Lukowicz, Josef van Genabith, Andrea Lösch, Philipp Slusallek, Morten Irgens, Patrick Gatellier, Joachim köhler, Laure Le Bars, Dimitra Anastasiou, Albina Auksoriūtė, Núria Bel, António Branco, Gerhard Budin, Walter Daelemans, Koenraad De Smedt, Radovan Garabík, Maria Gavriilidou, Dagmar Gromann, Svetla Koeva, Simon Krek, Cvetana Krstev, Krister Lindén, Bernardo Magnini, Jan Odijk, Maciej Ogrodniczuk, Eiríkur Rögnvaldsson, Mike Rosner, Bolette Sandford Pedersen, Inguna Skadiņa, Marko Tadić, Dan Tufiş, Tamás Váradi, Kadri Vider, Andy Way, François Yvon
Multilingualism is a cultural cornerstone of Europe and firmly anchored in the European treaties including full language equality.
no code implementations • NAACL 2021 • Hongfei Xu, Josef van Genabith, Qiuhui Liu, Deyi Xiong
Due to its effectiveness and performance, the Transformer translation model has attracted wide attention, most recently in terms of probing-based approaches.
no code implementations • ACL 2020 • Hongfei Xu, Qiuhui Liu, Josef van Genabith, Deyi Xiong, Jingyi Zhang
In this paper, we first empirically demonstrate that a simple modification made in the official implementation, which changes the computation order of residual connection and layer normalization, can significantly ease the optimization of deep Transformers.
no code implementations • WS 2019 • Ekaterina Lapshinova-Koltunski, Cristina España-Bonet, Josef van Genabith
We analyse coreference phenomena in three neural machine translation systems trained with different data settings with or without access to explicit intra- and cross-sentential anaphoric information.
no code implementations • 25 Sep 2019 • Dana Ruiter, Cristina España-Bonet, Josef van Genabith
Self-supervised neural machine translation (SS-NMT) learns how to extract/select suitable training data from comparable (rather than parallel) corpora and how to translate, in a way that the two tasks support each other in a virtuous circle.
no code implementations • 16 Aug 2019 • Santanu Pal, Marcos Zampieri, Josef van Genabith
The first edition of this shared task featured data from three pairs of similar languages: Czech and Polish, Hindi and Nepali, and Portuguese and Spanish.
no code implementations • COLING 2020 • Santanu Pal, Hongfei Xu, Nico Herbig, Sudip Kumar Naskar, Antonio Krueger, Josef van Genabith
In automatic post-editing (APE) it makes sense to condition post-editing (pe) decisions on both the source (src) and the machine translated text (mt) as input.
no code implementations • WS 2019 • Mihaela Vela, Santanu Pal, Marcos Zampieri, Sudip Kumar Naskar, Josef van Genabith
User feedback revealed that the users preferred using CATaLog Online over existing CAT tools in some respects, especially by selecting the output of the MT system and taking advantage of the color scheme for TM suggestions.
no code implementations • WS 2019 • Hongfei Xu, Qiuhui Liu, Josef van Genabith
In this paper, we describe our submission to the English-German APE shared task at WMT 2019.
no code implementations • WS 2019 • Riktim Mondal, Shankha Raj Nayek, Aditya Chowdhury, Santanu Pal, Sudip Kumar Naskar, Josef van Genabith
In this paper we describe our joint submission (JU-Saarland) from Jadavpur University and Saarland University in the WMT 2019 news translation shared task for English{--}Gujarati language pair within the translation task sub-track.
no code implementations • WS 2019 • Jingyi Zhang, Josef van Genabith
This paper describes the DFKI-NMT submission to the WMT19 News translation task.
no code implementations • WS 2019 • Santanu Pal, Marcos Zampieri, Josef van Genabith
The first edition of this shared task featured data from three pairs of similar languages: Czech and Polish, Hindi and Nepali, and Portuguese and Spanish.
no code implementations • WS 2019 • Santanu Pal, Hongfei Xu, Nico Herbig, Antonio Kr{\"u}ger, Josef van Genabith
In this paper we present an English{--}German Automatic Post-Editing (APE) system called transference, submitted to the APE Task organized at WMT 2019.
1 code implementation • ACL 2019 • Dana Ruiter, Cristina Espa{\~n}a-Bonet, Josef van Genabith
We present a simple new method where an emergent NMT system is used for simultaneously selecting training data and learning internal NMT representations.
no code implementations • 7 Mar 2019 • Nico Herbig, Santanu Pal, Josef van Genabith, Antonio Krüger
Current advances in machine translation increase the need for translators to switch from traditional translation to post-editing of machine-translated text, a process that saves time and improves quality.
1 code implementation • 16 Oct 2018 • Ahmad Taie, Raphael Rubino, Josef van Genabith
The advent of representation learning methods enabled large performance gains on various language tasks, alleviating the need for manual feature engineering.
no code implementations • WS 2018 • Santanu Pal, Nico Herbig, Antonio Kr{\"u}ger, Josef van Genabith
The proposed model is an extension of the transformer architecture: two separate self-attention-based encoders encode the machine translation output (mt) and the source (src), followed by a joint encoder that attends over a combination of these two encoded sequences (encsrc and encmt) for generating the post-edited sentence.
no code implementations • WS 2018 • Ch, Khyathi u, Ekaterina Loginova, Vishal Gupta, Josef van Genabith, G{\"u}nter Neumann, Manoj Chinnakotla, Eric Nyberg, Alan W. black
As a first step towards fostering research which supports CM in NLP applications, we systematically crowd-sourced and curated an evaluation dataset for factoid question answering in three CM languages - Hinglish (Hindi+English), Tenglish (Telugu+English) and Tamlish (Tamil+English) which belong to two language families (Indo-Aryan and Dravidian).
no code implementations • LREC 2018 • Andrea L{\"o}sch, Val{\'e}rie Mapelli, Stelios Piperidis, Andrejs Vasi{\c{l}}jevs, Lilli Smal, Thierry Declerck, Eileen Schnur, Khalid Choukri, Josef van Genabith
no code implementations • 25 Oct 2017 • Octavia-Maria Sulea, Marcos Zampieri, Shervin Malmasi, Mihaela Vela, Liviu P. Dinu, Josef van Genabith
In this paper, we investigate the application of text classification methods to support law professionals.
no code implementations • WS 2017 • Fraser Bowen, Jon Dehdari, Josef van Genabith
In this research we investigate the impact of mismatches in the density and type of error between training and test data on a neural system correcting preposition and determiner errors.
no code implementations • RANLP 2017 • Octavia-Maria Sulea, Marcos Zampieri, Mihaela Vela, Josef van Genabith
In this paper, we investigate the application of text classification methods to predict the law area and the decision of cases judged by the French Supreme Court.
1 code implementation • WS 2017 • Ben Peters, Jon Dehdari, Josef van Genabith
Grapheme-to-phoneme conversion (g2p) is necessary for text-to-speech and automatic speech recognition systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 18 Apr 2017 • Cristina España-Bonet, Ádám Csaba Varga, Alberto Barrón-Cedeño, Josef van Genabith
First, we systematically study the NMT context vectors, i. e. output of the encoder, and their power as an interlingua representation of a sentence.
no code implementations • WS 2018 • Georg Heigold, Günter Neumann, Josef van Genabith
In this paper, we study the impact of noisy input.
no code implementations • EACL 2017 • Hans Uszkoreit, Aleks Gabryszak, ra, Leonhard Hennig, J{\"o}rg Steffen, Renlong Ai, Stephan Busemann, Jon Dehdari, Josef van Genabith, Georg Heigold, Nils Rethmeier, Raphael Rubino, Sven Schmeier, Philippe Thomas, He Wang, Feiyu Xu
Web debates play an important role in enabling broad participation of constituencies in social, political and economic decision-taking.
no code implementations • EACL 2017 • Santanu Pal, Sudip Kumar Naskar, Mihaela Vela, Qun Liu, Josef van Genabith
APE translations produced by our system show statistically significant improvements over the first-stage MT, phrase-based APE and the best reported score on the WMT 2016 APE dataset by a previous neural APE system.
no code implementations • EACL 2017 • Georg Heigold, Guenter Neumann, Josef van Genabith
This paper investigates neural character-based morphological tagging for languages with complex morphology and large tag sets.
no code implementations • COLING 2016 • Santanu Pal, Sudip Kumar Naskar, Marcos Zampieri, Tapas Nayak, Josef van Genabith
We present a free web-based CAT tool called CATaLog Online which provides a novel and user-friendly online CAT environment for post-editors/translators.
no code implementations • COLING 2016 • Santanu Pal, Sudip Kumar Naskar, Josef van Genabith
In the paper we show that parallel system combination in the APE stage of a sequential MT-APE combination yields substantial translation improvements both measured in terms of automatic evaluation metrics as well as in terms of productivity improvements measured in a post-editing experiment.
no code implementations • COLING 2016 • Raphael Rubino, Stefania Degaetano-Ortlieb, Elke Teich, Josef van Genabith
In this paper we investigate the introduction of information theory inspired features to study long term diachronic change on three levels: lexis, part-of-speech and syntax.
1 code implementation • 21 Jun 2016 • Georg Heigold, Guenter Neumann, Josef van Genabith
We systematically explore a variety of neural architectures (DNN, CNN, CNNHighway, LSTM, BLSTM) to obtain character-based word vectors combined with bidirectional LSTMs to model across-word context in an end-to-end setting.
no code implementations • LREC 2016 • Santanu Pal, Marcos Zampieri, Sudip Kumar Naskar, Tapas Nayak, Mihaela Vela, Josef van Genabith
The tool features a number of editing and log functions similar to the desktop version of CATaLog enhanced with several new features that we describe in detail in this paper.
no code implementations • LREC 2016 • Georg Rehm, Jan Haji{\v{c}}, Josef van Genabith, Andrejs Vasiljevs
META-NET is a European network of excellence, founded in 2010, that consists of 60 research centres in 34 European countries.
no code implementations • LREC 2014 • Georg Rehm, Hans Uszkoreit, Sophia Ananiadou, N{\'u}ria Bel, Audron{\.e} Bielevi{\v{c}}ien{\.e}, Lars Borin, Ant{\'o}nio Branco, Gerhard Budin, Nicoletta Calzolari, Walter Daelemans, Radovan Garab{\'\i}k, Marko Grobelnik, Carmen Garc{\'\i}a-Mateo, Josef van Genabith, Jan Haji{\v{c}}, Inma Hern{\'a}ez, John Judge, Svetla Koeva, Simon Krek, Cvetana Krstev, Krister Lind{\'e}n, Bernardo Magnini, Joseph Mariani, John McNaught, Maite Melero, Monica Monachini, Asunci{\'o}n Moreno, Jan Odijk, Maciej Ogrodniczuk, Piotr P{\k{e}}zik, Stelios Piperidis, Adam Przepi{\'o}rkowski, Eir{\'\i}kur R{\"o}gnvaldsson, Michael Rosner, Bolette Pedersen, Inguna Skadi{\c{n}}a, Koenraad De Smedt, Marko Tadi{\'c}, Paul Thompson, Dan Tufi{\c{s}}, Tam{\'a}s V{\'a}radi, Andrejs Vasi{\c{l}}jevs, Kadri Vider, Jolanta Zabarskaite
This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics.
no code implementations • LREC 2012 • Khaled Shaalan, Mohammed Attia, Pavel Pecina, Younes Samih, Josef van Genabith
Furthermore, from a large list of valid forms and invalid forms we create a character-based tri-gram language model to approximate knowledge about permissible character clusters in Arabic, creating a novel method for detecting spelling errors.
no code implementations • LREC 2012 • Eleftherios Avramidis, Marta R. Costa-juss{\`a}, Christian Federmann, Josef van Genabith, Maite Melero, Pavel Pecina
This corpus aims to serve as a basic resource for further research on whether hybrid machine translation algorithms and system combination techniques can benefit from additional (linguistically motivated, decoding, and runtime) information provided by the different systems involved.
no code implementations • LREC 2012 • Teresa Lynn, {\"O}zlem {\c{C}}etino{\u{g}}lu, Jennifer Foster, Elaine U{\'\i} Dhonnchadha, Mark Dras, Josef van Genabith
This paper describes the early stages in the development of new language resources for Irish ― namely the first Irish dependency treebank and the first Irish statistical dependency parser.
no code implementations • LREC 2012 • Christian Federmann, Eleftherios Avramidis, Marta R. Costa-juss{\`a}, Josef van Genabith, Maite Melero, Pavel Pecina
We describe the Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation (ML4HMT) which aims to foster research on improved system combination approaches for machine translation (MT).
no code implementations • LREC 2012 • Mohammed Attia, Khaled Shaalan, Lamia Tounsi, Josef van Genabith
We utilize this annotation to automatically acquire grammatical function (dependency) based subcategorization frames and paths linking long-distance dependencies (LDDs).