Contrary to previous years’ editions, this year we acquired our own human ratings based on expert-based human evaluation via Multidimensional Quality Metrics (MQM).
In this paper, we present the joint contribution of Unbabel and IST to the WMT 2021 Metrics Shared Task.
We present MT-Telescope, a visualization platform designed to facilitate comparative analysis of the output quality of two Machine Translation (MT) systems.
Overall, our systems achieve strong results for all language pairs on previous test sets and in many cases set a new state-of-the-art.
We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgements.
Linear models, which support efficient learning and inference, are the workhorses of statistical machine translation; however, linear decision rules are less attractive from a modeling perspective.