This paper presents QUARTZ, QUality-AwaRe machine Translation, a project led by Unbabel which aims at developing machine translation systems that are more robust and produce fewer critical errors.
We present the joint contribution of IST and Unbabel to the WMT 2021 Shared Task on Quality Estimation.
We report the results of the WMT 2021 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels.
We present the joint contribution of IST and Unbabel to the WMT 2020 Shared Task on Quality Estimation.
We report the results of the WMT20 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word, sentence and document levels.
This paper presents the Multilingual Artificial Intelligence Agent Assistant (MAIA), a project led by Unbabel with the collaboration of CMU, INESC-ID and IT Lisbon.
DeepSPIN is a research project funded by the European Research Council (ERC) whose goal is to develop new neural structured prediction methods, models, and algorithms for improving the quality, interpretability, and data-efficiency of natural language processing (NLP) systems, with special emphasis on machine translation and quality estimation applications.
In this paper we provide a systematic comparison of existing and new document-level neural machine translation solutions.
In this paper, we present the joint contribution of Unbabel and IST to the WMT 2021 Metrics Shared Task.
Experiments on machine translation in two settings, static domain adaptation and ``on-the-fly'' adaptation, show that the chunk-based $k$NN-MT model leads to a significant speed-up (up to 4 times) with only a small drop in translation quality.
Despite the progress in machine translation quality estimation and evaluation in the last years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers around finding the most probable translation according to the model (MAP decoding), approximated with beam search.
On the other hand, semi-parametric models have been shown to successfully perform domain adaptation by retrieving examples from an in-domain datastore (Khandelwal et al., 2021).
In this work, leveraging meta-learning techniques, we extend this idea to improve the quality of the explanations themselves, specifically by optimizing explanations such that student models more effectively learn to simulate the original model.
Recent work has shown promising results in causal discovery by leveraging interventional data with gradient-based methods, even when the intervened variables are unknown.
In this paper, we combine the representational strengths of factor graphs and of neural networks, proposing undirected neural networks (UNNs): a flexible framework for specifying computations that can be performed in any order.
Transformers' quadratic complexity with respect to the input sequence length has motivated a body of work on efficient sparse approximations to softmax.
Although proper handling of discourse phenomena significantly contributes to the quality of machine translation (MT), common translation quality metrics do not adequately capture them.
Transformers are unable to model long-term memories effectively, since the amount of computation they need to perform grows with the context length.
Ranked #1 on Dialogue Generation on WikiText-103
When $\Omega$ is a Tsallis negentropy with parameter $\alpha$, we obtain "deformed exponential families," which include $\alpha$-entmax and sparsemax ($\alpha$ = 2) as particular cases.
Recent work in neural machine translation has demonstrated both the necessity and feasibility of using inter-sentential context -- context from sentences other than those currently being translated.
Visual attention mechanisms are a key component of neural network models for computer vision.
Neural networks and other machine learning models compute continuous representations, while humans communicate with discrete symbols.
Current sequence-to-sequence models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over target sequences.
We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE).
Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data.
In this paper, we propose a new training strategy which replaces these estimators by an exact yet efficient marginalization.
Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e. g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation).
Ranked #28 on Visual Question Answering on VQA v2 test-dev
Structured prediction requires manipulating a large number of combinatorial structures, e. g., dependency trees or alignments, either as latent or output variables.
Findings of the quantitative and qualitative analysis of our approach include that heads in different layers learn different sparsity preferences and tend to be more diverse in their attention distributions than softmax Transformers.
The combination of machines and humans for translation is effective, with many studies showing productivity gains when humans post-edit machine-translated output instead of translating from scratch.
We present the contribution of the Unbabel team to the WMT 2019 Shared Task on Quality Estimation.
Named entity recognition (NER) and entity linking (EL) are two fundamentally related tasks, since in order to perform EL, first the mentions to entities have to be detected.
Ranked #6 on Entity Linking on AIDA-CoNLL
In the Transformer model, unlike the RNN, the generation of a new word attends to the full sentence generated so far, not only to the last word, and it is not straightforward to apply the scheduled sampling technique.
Automatic post-editing (APE) seeks to automatically refine the output of a black-box machine translation (MT) system through human post-edits.
Analogously to dual-encoder architectures we develop a BERT-based encoder-decoder (BED) model in which a single pretrained BERT encoder receives both the source src and machine translation tgt strings.
We present a new neural model for text summarization that first extracts sentences from a document and then compresses them.
Despite the progress made in sentence-level NMT, current systems still fall short at achieving fluent, good quality translation for a full document.
Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction.
Deep NLP models benefit from underlying structures in the data---e. g., parse trees---typically extracted using off-the-shelf parsers.
In this work, we propose the task of translating Bilingual Multi-Speaker Conversations, and explore neural architectures which exploit both source and target-side conversation histories for this task.
This paper studies Fenchel-Young losses, a generic way to construct convex loss functions from a regularization function.
2 code implementations • • Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, Alexandra Birch
We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs.
We propose sparsemax, a new activation function similar to the traditional softmax, but able to output sparse probabilities.