Search Results for author: Josef van Genabith

Found 122 papers, 18 papers with code

INFODENS: An Open-source Framework for Learning Text Representations

1 code implementation16 Oct 2018 Ahmad Taie, Raphael Rubino, Josef van Genabith

The advent of representation learning methods enabled large performance gains on various language tasks, alleviating the need for manual feature engineering.

Feature Engineering General Classification +3

LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools

1 code implementation23 Jan 2024 Qianli Wang, Tatiana Anikina, Nils Feldhus, Josef van Genabith, Leonhard Hennig, Sebastian Möller

Interpretability tools that offer explanations in the form of a dialogue have demonstrated their efficacy in enhancing users' understanding, as one-off explanations may occasionally fall short in providing sufficient information to the user.

counterfactual Fact Checking +4

Mid-Air Hand Gestures for Post-Editing of Machine Translation

1 code implementation ACL 2021 Rashad Albo Jamara, Nico Herbig, Antonio Kr{\"u}ger, Josef van Genabith

Here, we present the first study that investigates the usefulness of mid-air hand gestures in combination with the keyboard (GK) for text editing in PE of MT.

Machine Translation Translation

Self-Supervised Neural Machine Translation

1 code implementation ACL 2019 Dana Ruiter, Cristina Espa{\~n}a-Bonet, Josef van Genabith

We present a simple new method where an emergent NMT system is used for simultaneously selecting training data and learning internal NMT representations.

Machine Translation NMT +1

Neural Morphological Tagging from Characters for Morphologically Rich Languages

1 code implementation21 Jun 2016 Georg Heigold, Guenter Neumann, Josef van Genabith

We systematically explore a variety of neural architectures (DNN, CNN, CNNHighway, LSTM, BLSTM) to obtain character-based word vectors combined with bidirectional LSTMs to model across-word context in an end-to-end setting.

Morphological Tagging TAG +1

Exploring Paracrawl for Document-level Neural Machine Translation

1 code implementation20 Apr 2023 Yusser Al Ghussin, Jingyi Zhang, Josef van Genabith

We show that document-level NMT models trained with only parallel paragraphs from Paracrawl can be used to translate real documents from TED, News and Europarl, outperforming sentence-level NMT models.

Machine Translation NMT +2

TransIns: Document Translation with Markup Reinsertion

1 code implementation EMNLP (ACL) 2021 Jörg Steffen, Josef van Genabith

This is challenging, as markup can be nested, apply to spans contiguous in source but non-contiguous in target etc.

Document Translation NMT +1

Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings?

1 code implementation28 Apr 2023 Sonal Sannigrahi, Josef van Genabith, Cristina Espana-Bonet

We demonstrate that while a simple sentence average results in a strong baseline for classification tasks, more complex combinations are necessary for semantic tasks.

Sentence Sentence Embeddings +1

When your Cousin has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages

1 code implementation23 May 2023 Niyati Bafna, Cristina España-Bonet, Josef van Genabith, Benoît Sagot, Rachel Bawden

Most existing approaches for unsupervised bilingual lexicon induction (BLI) depend on good quality static or contextual embeddings requiring large monolingual corpora for both languages.

Bilingual Lexicon Induction Language Modelling

An Empirical Analysis of NMT-Derived Interlingual Embeddings and their Use in Parallel Sentence Identification

no code implementations18 Apr 2017 Cristina España-Bonet, Ádám Csaba Varga, Alberto Barrón-Cedeño, Josef van Genabith

First, we systematically study the NMT context vectors, i. e. output of the encoder, and their power as an interlingua representation of a sentence.

Machine Translation NMT +3

Predicting the Law Area and Decisions of French Supreme Court Cases

no code implementations RANLP 2017 Octavia-Maria Sulea, Marcos Zampieri, Mihaela Vela, Josef van Genabith

In this paper, we investigate the application of text classification methods to predict the law area and the decision of cases judged by the French Supreme Court.

General Classification text-classification +1

Neural Automatic Post-Editing Using Prior Alignment and Reranking

no code implementations EACL 2017 Santanu Pal, Sudip Kumar Naskar, Mihaela Vela, Qun Liu, Josef van Genabith

APE translations produced by our system show statistically significant improvements over the first-stage MT, phrase-based APE and the best reported score on the WMT 2016 APE dataset by a previous neural APE system.

Automatic Post-Editing NMT +2

Code-Mixed Question Answering Challenge: Crowd-sourcing Data and Techniques

no code implementations WS 2018 Ch, Khyathi u, Ekaterina Loginova, Vishal Gupta, Josef van Genabith, G{\"u}nter Neumann, Manoj Chinnakotla, Eric Nyberg, Alan W. black

As a first step towards fostering research which supports CM in NLP applications, we systematically crowd-sourced and curated an evaluation dataset for factoid question answering in three CM languages - Hinglish (Hindi+English), Tenglish (Telugu+English) and Tamlish (Tamil+English) which belong to two language families (Indo-Aryan and Dravidian).

Question Answering Sentence

A Transformer-Based Multi-Source Automatic Post-Editing System

no code implementations WS 2018 Santanu Pal, Nico Herbig, Antonio Kr{\"u}ger, Josef van Genabith

The proposed model is an extension of the transformer architecture: two separate self-attention-based encoders encode the machine translation output (mt) and the source (src), followed by a joint encoder that attends over a combination of these two encoded sequences (encsrc and encmt) for generating the post-edited sentence.

Automatic Post-Editing NMT +2

The Effect of Error Rate in Artificially Generated Data for Automatic Preposition and Determiner Correction

no code implementations WS 2017 Fraser Bowen, Jon Dehdari, Josef van Genabith

In this research we investigate the impact of mismatches in the density and type of error between training and test data on a neural system correcting preposition and determiner errors.

Grammatical Error Correction Machine Translation

Modeling Diachronic Change in Scientific Writing with Information Density

no code implementations COLING 2016 Raphael Rubino, Stefania Degaetano-Ortlieb, Elke Teich, Josef van Genabith

In this paper we investigate the introduction of information theory inspired features to study long term diachronic change on three levels: lexis, part-of-speech and syntax.

General Classification Informativeness

Multi-Engine and Multi-Alignment Based Automatic Post-Editing and its Impact on Translation Productivity

no code implementations COLING 2016 Santanu Pal, Sudip Kumar Naskar, Josef van Genabith

In the paper we show that parallel system combination in the APE stage of a sequential MT-APE combination yields substantial translation improvements both measured in terms of automatic evaluation metrics as well as in terms of productivity improvements measured in a post-editing experiment.

Automatic Post-Editing Translation

Irish Treebanking and Parsing: A Preliminary Evaluation

no code implementations LREC 2012 Teresa Lynn, {\"O}zlem {\c{C}}etino{\u{g}}lu, Jennifer Foster, Elaine U{\'\i} Dhonnchadha, Mark Dras, Josef van Genabith

This paper describes the early stages in the development of new language resources for Irish ― namely the first Irish dependency treebank and the first Irish statistical dependency parser.

Machine Translation POS

A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation

no code implementations LREC 2012 Eleftherios Avramidis, Marta R. Costa-juss{\`a}, Christian Federmann, Josef van Genabith, Maite Melero, Pavel Pecina

This corpus aims to serve as a basic resource for further research on whether hybrid machine translation algorithms and system combination techniques can benefit from additional (linguistically motivated, decoding, and runtime) information provided by the different systems involved.

Machine Translation Translation

Arabic Word Generation and Modelling for Spell Checking

no code implementations LREC 2012 Khaled Shaalan, Mohammed Attia, Pavel Pecina, Younes Samih, Josef van Genabith

Furthermore, from a large list of valid forms and invalid forms we create a character-based tri-gram language model to approximate knowledge about permissible character clusters in Arabic, creating a novel method for detecting spelling errors.

Language Modelling Morphological Analysis +2

Automatic Extraction and Evaluation of Arabic LFG Resources

no code implementations LREC 2012 Mohammed Attia, Khaled Shaalan, Lamia Tounsi, Josef van Genabith

We utilize this annotation to automatically acquire grammatical function (dependency) based subcategorization frames and paths linking long-distance dependencies (LDDs).

POS

The ML4HMT Workshop on Optimising the Division of Labour in Hybrid Machine Translation

no code implementations LREC 2012 Christian Federmann, Eleftherios Avramidis, Marta R. Costa-juss{\`a}, Josef van Genabith, Maite Melero, Pavel Pecina

We describe the “Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation” (ML4HMT) which aims to foster research on improved system combination approaches for machine translation (MT).

Language Modelling Machine Translation +1

Integrating Artificial and Human Intelligence for Efficient Translation

no code implementations7 Mar 2019 Nico Herbig, Santanu Pal, Josef van Genabith, Antonio Krüger

Current advances in machine translation increase the need for translators to switch from traditional translation to post-editing of machine-translated text, a process that saves time and improves quality.

Machine Translation Translation

CATaLog Online: Porting a Post-editing Tool to the Web

no code implementations LREC 2016 Santanu Pal, Marcos Zampieri, Sudip Kumar Naskar, Tapas Nayak, Mihaela Vela, Josef van Genabith

The tool features a number of editing and log functions similar to the desktop version of CATaLog enhanced with several new features that we describe in detail in this paper.

Machine Translation Management +1

JU-Saarland Submission to the WMT2019 English--Gujarati Translation Shared Task

no code implementations WS 2019 Riktim Mondal, Shankha Raj Nayek, Aditya Chowdhury, Santanu Pal, Sudip Kumar Naskar, Josef van Genabith

In this paper we describe our joint submission (JU-Saarland) from Jadavpur University and Saarland University in the WMT 2019 news translation shared task for English{--}Gujarati language pair within the translation task sub-track.

Machine Translation NMT +1

USAAR-DFKI -- The Transference Architecture for English--German Automatic Post-Editing

no code implementations WS 2019 Santanu Pal, Hongfei Xu, Nico Herbig, Antonio Kr{\"u}ger, Josef van Genabith

In this paper we present an English{--}German Automatic Post-Editing (APE) system called transference, submitted to the APE Task organized at WMT 2019.

Automatic Post-Editing Translation

UDS--DFKI Submission to the WMT2019 Czech--Polish Similar Language Translation Shared Task

no code implementations WS 2019 Santanu Pal, Marcos Zampieri, Josef van Genabith

The first edition of this shared task featured data from three pairs of similar languages: Czech and Polish, Hindi and Nepali, and Portuguese and Spanish.

Translation

The Transference Architecture for Automatic Post-Editing

no code implementations COLING 2020 Santanu Pal, Hongfei Xu, Nico Herbig, Sudip Kumar Naskar, Antonio Krueger, Josef van Genabith

In automatic post-editing (APE) it makes sense to condition post-editing (pe) decisions on both the source (src) and the machine translated text (mt) as input.

Automatic Post-Editing NMT

Improving CAT Tools in the Translation Workflow: New Approaches and Evaluation

no code implementations WS 2019 Mihaela Vela, Santanu Pal, Marcos Zampieri, Sudip Kumar Naskar, Josef van Genabith

User feedback revealed that the users preferred using CATaLog Online over existing CAT tools in some respects, especially by selecting the output of the MT system and taking advantage of the color scheme for TM suggestions.

Automatic Post-Editing Management +1

UDS--DFKI Submission to the WMT2019 Similar Language Translation Shared Task

no code implementations16 Aug 2019 Santanu Pal, Marcos Zampieri, Josef van Genabith

The first edition of this shared task featured data from three pairs of similar languages: Czech and Polish, Hindi and Nepali, and Portuguese and Spanish.

Translation

Analysing Coreference in Transformer Outputs

no code implementations WS 2019 Ekaterina Lapshinova-Koltunski, Cristina España-Bonet, Josef van Genabith

We analyse coreference phenomena in three neural machine translation systems trained with different data settings with or without access to explicit intra- and cross-sentential anaphoric information.

Machine Translation Translation

Lipschitz Constrained Parameter Initialization for Deep Transformers

no code implementations ACL 2020 Hongfei Xu, Qiuhui Liu, Josef van Genabith, Deyi Xiong, Jingyi Zhang

In this paper, we first empirically demonstrate that a simple modification made in the official implementation, which changes the computation order of residual connection and layer normalization, can significantly ease the optimization of deep Transformers.

Translation

Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers

no code implementations NAACL 2021 Hongfei Xu, Josef van Genabith, Qiuhui Liu, Deyi Xiong

Due to its effectiveness and performance, the Transformer translation model has attracted wide attention, most recently in terms of probing-based approaches.

Translation Word Translation

Self-Induced Curriculum Learning in Self-Supervised Neural Machine Translation

no code implementations EMNLP 2020 Dana Ruiter, Josef van Genabith, Cristina España-Bonet

Self-supervised neural machine translation (SSNMT) jointly learns to identify and select suitable training data from comparable (rather than parallel) corpora and to translate, in a way that the two tasks support each other in a virtuous circle.

Denoising Machine Translation +1

Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change

no code implementations ACL 2020 Hongfei Xu, Josef van Genabith, Deyi Xiong, Qiuhui Liu

We propose to automatically and dynamically determine batch sizes by accumulating gradients of mini-batches and performing an optimization step at just the time when the direction of gradients starts to fluctuate.

MMPE: A Multi-Modal Interface for Post-Editing Machine Translation

no code implementations ACL 2020 Nico Herbig, Tim D{\"u}wel, Santanu Pal, Kalliopi Meladaki, Mahsa Monshizadeh, Antonio Kr{\"u}ger, Josef van Genabith

On the other hand, speech and multi-modal combinations of select {\&} speech are considered suitable for replacements and insertions but offer less potential for deletion and reordering.

Machine Translation Translation

MMPE: A Multi-Modal Interface using Handwriting, Touch Reordering, and Speech Commands for Post-Editing Machine Translation

no code implementations ACL 2020 Nico Herbig, Santanu Pal, Tim D{\"u}wel, Kalliopi Meladaki, Mahsa Monshizadeh, Vladislav Hnatovskiy, Antonio Kr{\"u}ger, Josef van Genabith

The shift from traditional translation to post-editing (PE) of machine-translated (MT) text can save time and reduce errors, but it also affects the design of translation interfaces, as the task changes from mainly generating text to correcting errors within otherwise helpful translation proposals.

Machine Translation Translation

How Human is Machine Translationese? Comparing Human and Machine Translations of Text and Speech

no code implementations WS 2020 Yuri Bizzoni, Tom S Juzek, Cristina Espa{\~n}a-Bonet, Koel Dutta Chowdhury, Josef van Genabith, Elke Teich

Some translationese features tend to appear in simultaneous interpreting with higher frequency than in human text translation, but the reasons for this are unclear.

Machine Translation Translation

Learning Source Phrase Representations for Neural Machine Translation

no code implementations ACL 2020 Hongfei Xu, Josef van Genabith, Deyi Xiong, Qiuhui Liu, Jingyi Zhang

Considering that modeling phrases instead of words has significantly improved the Statistical Machine Translation (SMT) approach through the use of larger translation blocks ("phrases") and its reordering ability, modeling NMT at phrase level is an intuitive proposal to help the model capture long-distance relationships.

Machine Translation NMT +1

Rewiring the Transformer with Depth-Wise LSTMs

no code implementations13 Jul 2020 Hongfei Xu, Yang song, Qiuhui Liu, Josef van Genabith, Deyi Xiong

Stacking non-linear layers allows deep neural networks to model complicated functions, and including residual connections in Transformer layers is beneficial for convergence and performance.

NMT Time Series Analysis

Linguistically inspired morphological inflection with a sequence to sequence model

no code implementations4 Sep 2020 Eleni Metheniti, Guenter Neumann, Josef van Genabith

Inflection is an essential part of every human language's morphology, yet little effort has been made to unify linguistic theory and computational methods in recent years.

Language Acquisition LEMMA +1

Translation Quality Estimation by Jointly Learning to Score and Rank

no code implementations EMNLP 2020 Jingyi Zhang, Josef van Genabith

In order to make use of different types of human evaluation data for supervised learning, we present a multi-task learning QE model that jointly learns two tasks: score a translation and rank two translations.

Multi-Task Learning Sentence +2

Multi-Head Highly Parallelized LSTM Decoder for Neural Machine Translation

no code implementations ACL 2021 Hongfei Xu, Qiuhui Liu, Josef van Genabith, Deyi Xiong, Meng Zhang

This has to be computed n times for a sequence of length n. The linear transformations involved in the LSTM gate and state computations are the major cost factors in this.

Machine Translation Translation

A Bidirectional Transformer Based Alignment Model for Unsupervised Word Alignment

no code implementations ACL 2021 Jingyi Zhang, Josef van Genabith

We further fine-tune the target-to-source attention in the BTBA model to obtain better alignments using a full context based optimization method and self-supervised training.

Machine Translation Translation +1

Understanding Translationese in Multi-view Embedding Spaces

no code implementations COLING 2020 Koel Dutta Chowdhury, Cristina Espa{\~n}a-Bonet, Josef van Genabith

Recent studies use a combination of lexical and syntactic features to show that footprints of the source language remain visible in translations, to the extent that it is possible to predict the original source language from the translation.

Translation

UdS-DFKI@WMT20: Unsupervised MT and Very Low Resource Supervised MT for German-Upper Sorbian

no code implementations WMT (EMNLP) 2020 Sourav Dutta, Jesujoba Alabi, Saptarashmi Bandyopadhyay, Dana Ruiter, Josef van Genabith

This paper describes the UdS-DFKI submission to the shared task for unsupervised machine translation (MT) and very low-resource supervised MT between German (de) and Upper Sorbian (hsb) at the Fifth Conference of Machine Translation (WMT20).

Translation Unsupervised Machine Translation

Tracing Source Language Interference in Translation with Graph-Isomorphism Measures

no code implementations RANLP 2021 Koel Dutta Chowdhury, Cristina España-Bonet, Josef van Genabith

Previous research has used linguistic features to show that translations exhibit traces of source language interference and that phylogenetic trees between languages can be reconstructed from the results of translations into the same language.

Open-Ended Question Answering Translation

Self-Induced Curriculum Learning in Neural Machine Translation

no code implementations25 Sep 2019 Dana Ruiter, Cristina España-Bonet, Josef van Genabith

Self-supervised neural machine translation (SS-NMT) learns how to extract/select suitable training data from comparable (rather than parallel) corpora and how to translate, in a way that the two tasks support each other in a virtuous circle.

Denoising Machine Translation +2

Towards Debiasing Translation Artifacts

1 code implementation NAACL 2022 Koel Dutta Chowdhury, Rricha Jalota, Cristina España-Bonet, Josef van Genabith

Cross-lingual natural language processing relies on translation, either by humans or machines, at different levels, from translating training data to translating test sets.

Natural Language Inference Sentence +1

Exploiting Social Media Content for Self-Supervised Style Transfer

1 code implementation NAACL (SocialNLP) 2022 Dana Ruiter, Thomas Kleinbauer, Cristina España-Bonet, Josef van Genabith, Dietrich Klakow

Recent research on style transfer takes inspiration from unsupervised neural machine translation (UNMT), learning from large amounts of non-parallel data by exploiting cycle consistency loss, back-translation, and denoising autoencoders.

Attribute Denoising +4

Explaining Translationese: why are Neural Classifiers Better and what do they Learn?

no code implementations24 Oct 2022 Kwabena Amponsah-Kaakyire, Daria Pylypenko, Josef van Genabith, Cristina España-Bonet

Previous research did not show $(i)$ whether the difference is because of the features, the classifiers or both, and $(ii)$ what the neural classifiers actually learn.

Feature Engineering Representation Learning

NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual Question Answering

no code implementations7 Nov 2022 Tengxun Zhang, Hongfei Xu, Josef van Genabith, Deyi Xiong, Hongying Zan

Hybrid tabular-textual question answering (QA) requires reasoning from heterogeneous information, and the types of reasoning are mainly divided into numerical reasoning and span extraction.

Question Answering

RescueSpeech: A German Corpus for Speech Recognition in Search and Rescue Domain

no code implementations6 Jun 2023 Sangeet Sagar, Mirco Ravanelli, Bernd Kiefer, Ivana Kruijff Korbayova, Josef van Genabith

Despite the recent advancements in speech recognition, there are still difficulties in accurately transcribing conversational and emotional speech in noisy and reverberant acoustic environments.

Decision Making Robust Speech Recognition +1

Measuring Spurious Correlation in Classification: 'Clever Hans' in Translationese

1 code implementation25 Aug 2023 Angana Borah, Daria Pylypenko, Cristina Espana-Bonet, Josef van Genabith

Translationese signals are subtle (especially for professional translation) and compete with many other signals in the data such as genre, style, author, and, in particular, topic.

Classification

Translating away Translationese without Parallel Data

no code implementations28 Oct 2023 Rricha Jalota, Koel Dutta Chowdhury, Cristina España-Bonet, Josef van Genabith

We show how we can eliminate the need for parallel validation data by combining the self-supervised loss with an unsupervised loss.

Binary Classification Language Modelling +3

Where exactly does contextualization in a PLM happen?

no code implementations11 Dec 2023 Soniya Vijayakumar, Tanja Bäumel, Simon Ostermann, Josef van Genabith

Pre-trained Language Models (PLMs) have shown to be consistently successful in a plethora of NLP tasks due to their ability to learn contextualized representations of words (Ethayarajh, 2019).

Language Modelling Sentence +1

Cannot find the paper you are looking for? You can Submit a new open access paper.