An exploratory approach to the Parallel Corpus Filtering shared task WMT20

WMT (EMNLP) 2020  ·  Ankur Kejriwal, Philipp Koehn ·

In this document we describe our submission to the parallel corpus filtering task using multilingual word embedding, language models and an ensemble of pre and post filtering rules. We use the norms of embedding and the perplexities of language models along with pre/post filtering rules to complement the LASER baseline scores and in the end get an improvement on the dev set in both language pairs.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here