TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Machine Translation	WMT 2017 English-Latvian	Transformer trained on highly filtered data	BLEU	22.89	# 1
Machine Translation	WMT 2017 Latvian-English	Transformer trained on highly filtered data	BLEU	24.37	# 1
Machine Translation	WMT 2018 English-Finnish	Transformer trained on highly filtered data	BLEU	17.40	# 1
Machine Translation	WMT 2018 Finnish-English	Transformer trained on highly filtered data	BLEU	24.00	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/impact-of-corpora-quality-on-neural-machine/machine-translation-on-wmt-2017-english)](https://paperswithcode.com/sota/machine-translation-on-wmt-2017-english?p=impact-of-corpora-quality-on-neural-machine)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/impact-of-corpora-quality-on-neural-machine/machine-translation-on-wmt-2017-latvian)](https://paperswithcode.com/sota/machine-translation-on-wmt-2017-latvian?p=impact-of-corpora-quality-on-neural-machine)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/impact-of-corpora-quality-on-neural-machine/machine-translation-on-wmt-2018-english-1)](https://paperswithcode.com/sota/machine-translation-on-wmt-2018-english-1?p=impact-of-corpora-quality-on-neural-machine)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/impact-of-corpora-quality-on-neural-machine/machine-translation-on-wmt-2018-finnish)](https://paperswithcode.com/sota/machine-translation-on-wmt-2018-finnish?p=impact-of-corpora-quality-on-neural-machine)`

Impact of Corpora Quality on Neural Machine Translation

19 Oct 2018 · Matīss Rikters ·

Large parallel corpora that are automatically obtained from the web, documents or elsewhere often exhibit many corrupted parts that are bound to negatively affect the quality of the systems and models that learn from these corpora. This paper describes frequent problems found in data and such data affects neural machine translation systems, as well as how to identify and deal with them. The solutions are summarised in a set of scripts that remove problematic sentences from input corpora.

PDF Abstract

Code

Add Remove Mark official

M4t1ss/parallel-corpora-tools official

Tasks

Add Remove

Machine Translation

Translation

Datasets

WMT 2018

Results from the Paper

Edit

Ranked #1 on Machine Translation on WMT 2018 English-Finnish

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Machine Translation	WMT 2017 English-Latvian	Transformer trained on highly filtered data	BLEU	22.89	# 1	Compare
Machine Translation	WMT 2017 Latvian-English	Transformer trained on highly filtered data	BLEU	24.37	# 1	Compare
Machine Translation	WMT 2018 English-Finnish	Transformer trained on highly filtered data	BLEU	17.40	# 1	Compare
Machine Translation	WMT 2018 Finnish-English	Transformer trained on highly filtered data	BLEU	24.00	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Impact of Corpora Quality on Neural Machine Translation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove