This shared task will examine automatic evaluation metrics for machine translation. The goals of the shared metrics task are:

To achieve the strongest correlation with human judgement of translation quality; To illustrate the suitability of an automatic evaluation metric as a surrogate for human evaluation; To address problems associated with comparison with a single reference translation; To move automatic evaluation beyond system-level ranking to finer-grained sentence-level ranking.

All datasets for this task are available here.


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


  • Unknown