Search Results for author: Chi-kiu Lo

Found 27 papers, 0 papers with code

Improving Parallel Data Identification using Iteratively Refined Sentence Alignments and Bilingual Mappings of Pre-trained Language Models

no code implementations WMT (EMNLP) 2020 Chi-kiu Lo, Eric Joanis

The National Research Council of Canada’s team submissions to the parallel corpus filtering task at the Fifth Conference on Machine Translation are based on two key components: (1) iteratively refined statistical sentence alignments for extracting sentence pairs from document pairs and (2) a crosslingual semantic textual similarity metric based on a pretrained multilingual language model, XLM-RoBERTa, with bilingual mappings learnt from a minimal amount of clean parallel data for scoring the parallelism of the extracted sentence pairs.

Language Modelling Machine Translation +5

The Nunavut Hansard Inuktitut--English Parallel Corpus 3.0 with Preliminary Machine Translation Results

no code implementations LREC 2020 Eric Joanis, Rebecca Knowles, Rol Kuhn, , Samuel Larkin, Patrick Littell, Chi-kiu Lo, Darlene Stewart, Jeffrey Micher

This paper describes a newly released sentence-aligned Inuktitut{--}English corpus based on the proceedings of the Legislative Assembly of Nunavut, covering sessions from April 1999 to June 2017.

Machine Translation NMT +2

Fully Unsupervised Crosslingual Semantic Textual Similarity Metric Based on BERT for Identifying Parallel Data

no code implementations CONLL 2019 Chi-kiu Lo, Michel Simard

With the advent of massively multilingual context representation models such as BERT, which are trained on the concatenation of non-parallel data from each language, we show that the deadlock around parallel resources can be broken.

Machine Translation Natural Language Understanding +3

YiSi - a Unified Semantic MT Quality Evaluation and Estimation Metric for Languages with Different Levels of Available Resources

no code implementations WS 2019 Chi-kiu Lo

We present YiSi, a unified automatic semantic machine translation quality evaluation and estimation metric for languages with different levels of available resources.

Machine Translation Semantic Similarity +2

Multi-Source Transformer for Kazakh-Russian-English Neural Machine Translation

no code implementations WS 2019 Patrick Littell, Chi-kiu Lo, Samuel Larkin, Darlene Stewart

We describe the neural machine translation (NMT) system developed at the National Research Council of Canada (NRC) for the Kazakh-English news translation task of the Fourth Conference on Machine Translation (WMT19).

Machine Translation NMT +2

NRC Parallel Corpus Filtering System for WMT 2019

no code implementations WS 2019 Gabriel Bernier-Colborne, Chi-kiu Lo

We describe the National Research Council Canada team{'}s submissions to the parallel corpus filtering task at the Fourth Conference on Machine Translation.

Machine Translation Translation

Measuring sentence parallelism using Mahalanobis distances: The NRC unsupervised submissions to the WMT18 Parallel Corpus Filtering shared task

no code implementations WS 2018 Patrick Littell, Samuel Larkin, Darlene Stewart, Michel Simard, Cyril Goutte, Chi-kiu Lo

The WMT18 shared task on parallel corpus filtering (Koehn et al., 2018b) challenged teams to score sentence pairs from a large high-recall, low-precision web-scraped parallel corpus (Koehn et al., 2018a).

Anomaly Detection Machine Translation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.