no code implementations • MTSummit 2021 • Samuel Larkin, Michel Simard, Rebecca Knowles
We revisit the topic of translation direction in the data used for training neural machine translation systems and focusing on a real-world scenario with known translation direction and imbalances in translation direction: the Canadian Hansard.
no code implementations • AMTA 2022 • Shivendra Bhardwa, David Alfonso-Hermelo, Philippe Langlais, Gabriel Bernier-Colborne, Cyril Goutte, Michel Simard
While recent studies have been dedicated to cleaning very noisy parallel corpora to improve Machine Translation training, we focus in this work on filtering a large and mostly clean Translation Memory.
no code implementations • 26 Dec 2022 • Diego Maupomé, Fanny Rancourt, Thomas Soulas, Alexandre Lachance, Marie-Jean Meurs, Desislava Aleksandrova, Olivier Brochu Dufour, Igor Pontes, Rémi Cardon, Michel Simard, Sowmya Vajjala
This report summarizes the work carried out by the authors during the Twelfth Montreal Industrial Problem Solving Workshop, held at Universit\'e de Montr\'eal in August 2022.
no code implementations • COLING 2020 • Shivendra Bhardwaj, David Alfonso Hermelo, Phillippe Langlais, Gabriel Bernier-Colborne, Cyril Goutte, Michel Simard
Deep neural models tremendously improved machine translation.
no code implementations • CONLL 2019 • Chi-kiu Lo, Michel Simard
With the advent of massively multilingual context representation models such as BERT, which are trained on the concatenation of non-parallel data from each language, we show that the deadlock around parallel resources can be broken.
no code implementations • WS 2018 • Chi-kiu Lo, Michel Simard, Darlene Stewart, Samuel Larkin, Cyril Goutte, Patrick Littell
We present our semantic textual similarity approach in filtering a noisy web crawled parallel corpus using YiSi{---}a novel semantic machine translation evaluation metric.
no code implementations • WS 2018 • Patrick Littell, Samuel Larkin, Darlene Stewart, Michel Simard, Cyril Goutte, Chi-kiu Lo
The WMT18 shared task on parallel corpus filtering (Koehn et al., 2018b) challenged teams to score sentence pairs from a large high-recall, low-precision web-scraped parallel corpus (Koehn et al., 2018a).