no code implementations • EMNLP 2020 • Yvette Graham, Barry Haddow, Philipp Koehn
In addition, we provide a re-evaluation of a past machine translation evaluation claiming human-parity of MT.
no code implementations • IWSLT 2016 • Maria Nădejde, Alexandra Birch, Philipp Koehn
String-to-tree MT systems translate verbs without lexical or syntactic context on the source side and with limited target-side context.
no code implementations • MTSummit 2021 • Gaurav Kumar, Philipp Koehn, Sanjeev Khudanpur
Low-resource Multilingual Neural Machine Translation (MNMT) is typically tasked with improving the translation performance on one or more language pairs with the aid of high-resource language pairs.
no code implementations • WMT (EMNLP) 2020 • Felicia Koerner, Philipp Koehn
This paper describes our submission to the WMT20 Parallel Corpus Filtering and Alignment for Low-Resource Conditions Shared Task.
no code implementations • WMT (EMNLP) 2020 • Ankur Kejriwal, Philipp Koehn
In this document we describe our submission to the parallel corpus filtering task using multilingual word embedding, language models and an ensemble of pre and post filtering rules.
no code implementations • WMT (EMNLP) 2020 • Philipp Koehn, Vishrav Chaudhary, Ahmed El-Kishky, Naman Goyal, Peng-Jen Chen, Francisco Guzmán
Following two preceding WMT Shared Task on Parallel Corpus Filtering (Koehn et al., 2018, 2019), we posed again the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting the highest-quality data to be used to train ma-chine translation systems.
1 code implementation • MTSummit 2021 • Kelly Marchisio, Philipp Koehn, Conghao Xiong
Aimed at generating a seed lexicon for use in downstream natural language tasks and unsupervised methods for bilingual lexicon induction have received much attention in the academic literature recently.
no code implementations • WMT (EMNLP) 2020 • Lucia Specia, Zhenhao Li, Juan Pino, Vishrav Chaudhary, Francisco Guzmán, Graham Neubig, Nadir Durrani, Yonatan Belinkov, Philipp Koehn, Hassan Sajjad, Paul Michel, Xian Li
We report the findings of the second edition of the shared task on improving robustness in Machine Translation (MT).
1 code implementation • WMT (EMNLP) 2021 • Chau Tran, Shruti Bhosale, James Cross, Philipp Koehn, Sergey Edunov, Angela Fan
We describe Facebook’s multilingual model submission to the WMT2021 shared task on news translation.
no code implementations • AMTA 2016 • Marina Sanchez-Torron, Philipp Koehn
We assessed how different machine translation (MT) systems affect the post-editing (PE) process and product of professional English–Spanish translators.
no code implementations • WMT (EMNLP) 2021 • Md Mahfuz ibn Alam, Ivana Kvapilíková, Antonios Anastasopoulos, Laurent Besacier, Georgiana Dinu, Marcello Federico, Matthias Gallé, Kweonwoo Jung, Philipp Koehn, Vassilina Nikoulina
Language domains that require very careful use of terminology are abundant and reflect a significant part of the translation industry.
no code implementations • AMTA 2016 • Rebecca Knowles, Philipp Koehn
We present an interactive translation prediction method based on neural machine translation.
no code implementations • AMTA 2022 • Kelly Marchisio, Conghao Xiong, Philipp Koehn
A popular natural language processing task decades ago, word alignment has been dominated until recently by GIZA++, a statistical method based on the 30-year-old IBM models.
no code implementations • WMT (EMNLP) 2021 • Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri
This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021. In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories.
no code implementations • 6 Nov 2023 • Longyue Wang, Zhaopeng Tu, Yan Gu, Siyou Liu, Dian Yu, Qingsong Ma, Chenyang Lyu, Liting Zhou, Chao-Hong Liu, Yufeng Ma, WeiYu Chen, Yvette Graham, Bonnie Webber, Philipp Koehn, Andy Way, Yulin Yuan, Shuming Shi
To foster progress in this domain, we hold a new shared task at WMT 2023, the first edition of the Discourse-Level Literary Translation.
no code implementations • 4 Nov 2023 • Weiting Tan, Haoran Xu, Lingfeng Shen, Shuyue Stella Li, Kenton Murray, Philipp Koehn, Benjamin Van Durme, Yunmo Chen
Large language models trained primarily in a monolingual setting have demonstrated their ability to generalize to machine translation using zero- and few-shot examples with in-context learning.
no code implementations • 2 Oct 2023 • Tianjian Li, Haoran Xu, Philipp Koehn, Daniel Khashabi, Kenton Murray
Text generation models are notoriously vulnerable to errors in the training data.
1 code implementation • 23 May 2023 • Haoran Xu, Weiting Tan, Shuyue Stella Li, Yunmo Chen, Benjamin Van Durme, Philipp Koehn, Kenton Murray
Incorporating language-specific (LS) modules is a proven method to boost performance in multilingual machine translation.
no code implementations • 23 May 2023 • Elizabeth Salesky, Neha Verma, Philipp Koehn, Matt Post
We introduce and demonstrate how to effectively train multilingual machine translation models with pixel representations.
no code implementations • 25 Oct 2022 • Kelly Marchisio, Ali Saad-Eldin, Kevin Duh, Carey Priebe, Philipp Koehn
Bilingual lexicons form a critical component of various natural language processing applications, including unsupervised and semisupervised machine translation and crosslingual information retrieval.
1 code implementation • 11 Oct 2022 • Kelly Marchisio, Neha Verma, Kevin Duh, Philipp Koehn
The ability to extract high-quality translation dictionaries from monolingual word embedding spaces depends critically on the geometric similarity of the spaces -- their degree of "isomorphism."
no code implementations • 10 Oct 2022 • Weiting Tan, Kevin Heffernan, Holger Schwenk, Philipp Koehn
Multilingual sentence representations from large models encode semantic information from two or more languages and can be used for different cross-lingual information retrieval and matching tasks.
1 code implementation • 23 Aug 2022 • Weiting Tan, Philipp Koehn
Mining high-quality bitexts for low-resource languages is challenging.
5 code implementations • Meta AI 2022 • NLLB team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Jeff Wang
Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today.
Ranked #1 on
Machine Translation
on IWSLT2015 English-Vietnamese
(SacreBLEU metric)
1 code implementation • 23 May 2022 • Haoran Xu, Philipp Koehn, Kenton Murray
We first highlight the large sensitivity (contribution) gap among high-sensitivity and low-sensitivity parameters and show that the model generalization performance can be significantly improved after balancing the contribution of all parameters.
no code implementations • AMTA 2022 • Daniel Licht, Cynthia Gao, Janice Lam, Francisco Guzman, Mona Diab, Philipp Koehn
Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs.
no code implementations • Findings (NAACL) 2022 • Yukun Feng, Feng Li, Ziang Song, Boyuan Zheng, Philipp Koehn
We conduct experiments on three popular datasets for document-level machine translation and our model has an average improvement of 0. 91 s-BLEU over the sentence-level baseline.
no code implementations • 25 Mar 2022 • Tasnim Mohiuddin, Philipp Koehn, Vishrav Chaudhary, James Cross, Shruti Bhosale, Shafiq Joty
In this work, we introduce a two-stage curriculum training framework for NMT where we fine-tune a base NMT model on subsets of data, selected by both deterministic scoring using pre-trained methods and online scoring that considers prediction scores of the emerging NMT model.
no code implementations • ACL 2022 • Simeng Sun, Angela Fan, James Cross, Vishrav Chaudhary, Chau Tran, Philipp Koehn, Francisco Guzman
Further, we find that incorporating alternative inputs via self-ensemble can be particularly effective when training set is small, leading to +5 BLEU when only 5% of the total training data is accessible.
1 code implementation • AMTA 2022 • Weiting Tan, Shuoyang Ding, Huda Khayrallah, Philipp Koehn
Neural Machine Translation (NMT) models are known to suffer from noisy inputs.
no code implementations • ICLR 2022 • Xuan-Phi Nguyen, Hongyu Gong, Yun Tang, Changhan Wang, Philipp Koehn, Shafiq Joty
Modern unsupervised machine translation systems mostly train their models by generating synthetic parallel training data from large unlabeled monolingual corpora of different languages through various means, such as iterative back-translation.
1 code implementation • Findings (EMNLP) 2021 • Kelly Marchisio, Youngser Park, Ali Saad-Eldin, Anton Alyakin, Kevin Duh, Carey Priebe, Philipp Koehn
Alternatively, word embeddings may be understood as nodes in a weighted graph.
no code implementations • WMT (EMNLP) 2021 • Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Christian Federmann, Philipp Koehn
This paper presents the JHU-Microsoft joint submission for WMT 2021 quality estimation shared task.
1 code implementation • EMNLP 2021 • Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Philipp Koehn
We propose a novel scheme to use the Levenshtein Transformer to perform the task of word-level quality estimation.
no code implementations • 6 Aug 2021 • Chau Tran, Shruti Bhosale, James Cross, Philipp Koehn, Sergey Edunov, Angela Fan
We describe Facebook's multilingual model submission to the WMT2021 shared task on news translation.
1 code implementation • 19 Jul 2021 • Haoran Xu, Philipp Koehn
Typically, a linearly orthogonal transformation mapping is learned by aligning static type-level embeddings to build a shared semantic space.
1 code implementation • 22 Jun 2021 • Md Mahfuz ibn Alam, Antonios Anastasopoulos, Laurent Besacier, James Cross, Matthias Gallé, Philipp Koehn, Vassilina Nikoulina
As neural machine translation (NMT) systems become an important part of professional translator pipelines, a growing body of work focuses on combining NMT with terminologies.
1 code implementation • ACL 2021 • Wei-Jen Ko, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Naman Goyal, Francisco Guzmán, Pascale Fung, Philipp Koehn, Mona Diab
The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages.
1 code implementation • 18 Apr 2021 • Kelly Marchisio, Conghao Xiong, Philipp Koehn
In the lowest-resource setting, we outperform GIZA++ by 8. 5, 10. 9, and 12 AER for Ro-En, De-En, and En-Fr, respectively.
no code implementations • EMNLP 2021 • Ahmed El-Kishky, Adithya Renduchintala, James Cross, Francisco Guzmán, Philipp Koehn
Cross-lingual named-entity lexica are an important resource to multilingual NLP tasks such as machine translation and cross-lingual wikification.
1 code implementation • NAACL 2021 • Shuoyang Ding, Philipp Koehn
Saliency methods are widely used to interpret neural network predictions, but different variants of saliency methods often disagree even on the interpretations of the same prediction made by the same model.
no code implementations • WMT (EMNLP) 2021 • Gaurav Kumar, Philipp Koehn, Sanjeev Khudanpur
These feature weights which are optimized directly for the task of improving translation performance, are used to score and filter sentences in the noisy corpora more effectively.
no code implementations • 11 Mar 2021 • Gaurav Kumar, Philipp Koehn, Sanjeev Khudanpur
Low-resource Multilingual Neural Machine Translation (MNMT) is typically tasked with improving the translation performance on one or more language pairs with the aid of high-resource language pairs.
1 code implementation • EACL (AdaptNLP) 2021 • Haoran Xu, Philipp Koehn
Linear embedding transformation has been shown to be effective for zero-shot cross-lingual transfer tasks and achieve surprisingly promising results.
no code implementations • EMNLP 2020 • Loïc Barrault, Magdalena Biesialska, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Eric Joanis, Tom Kocmi, Philipp Koehn, Chi-kiu Lo, Nikola Ljubešić, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Santanu Pal, Matt Post, Marcos Zampieri
In the news task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting mainly of news stories.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Xutai Ma, Juan Pino, Philipp Koehn
Simultaneous text translation and end-to-end speech translation have recently made great progress but little work has combined these tasks together.
no code implementations • 30 Oct 2020 • Xutai Ma, Yongqiang Wang, Mohammad Javad Dousti, Philipp Koehn, Juan Pino
Transformer-based models have achieved state-of-the-art performance on speech translation tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • EMNLP (NLP-COVID19) 2020 • Antonios Anastasopoulos, Alessandro Cattelan, Zi-Yi Dou, Marcello Federico, Christian Federman, Dmitriy Genzel, Francisco Guzmán, Junjie Hu, Macduff Hughes, Philipp Koehn, Rosie Lazar, Will Lewis, Graham Neubig, Mengmeng Niu, Alp Öktem, Eric Paquin, Grace Tang, Sylwia Tur
Further, the team is converting the test and development data into translation memories (TMXs) that can be used by localizers from and to any of the languages.
2 code implementations • ACL 2020 • Marta Ba{\~n}{\'o}n, Pin-zhen Chen, Barry Haddow, Kenneth Heafield, Hieu Hoang, Miquel Espl{\`a}-Gomis, Mikel L. Forcada, Amir Kamran, Faheem Kirefu, Philipp Koehn, Sergio Ortiz Rojas, Leopoldo Pla Sempere, Gema Ram{\'\i}rez-S{\'a}nchez, Elsa Sarr{\'\i}as, Marek Strelec, Brian Thompson, William Waites, Dion Wiggins, Jaume Zaragoza
We report on methods to create the largest publicly available parallel corpora by crawling the web, using open source software.
1 code implementation • EMNLP 2020 • Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn
Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings.
1 code implementation • EMNLP 2020 • Brian Thompson, Philipp Koehn
We present a simple document alignment method that incorporates sentence order information in both candidate generation and candidate re-scoring.
no code implementations • WMT (EMNLP) 2020 • Kelly Marchisio, Kevin Duh, Philipp Koehn
We additionally find that unsupervised MT performance declines when source and target languages use different scripts, and observe very poor performance on authentic low-resource language pairs.
no code implementations • EMNLP 2020 • Ahmed El-Kishky, Vishrav Chaudhary, Francisco Guzman, Philipp Koehn
We mine sixty-eight snapshots of the Common Crawl corpus and identify web document pairs that are translations of each other.
no code implementations • IJCNLP 2019 • Adithya Renduchintala, Philipp Koehn, Jason Eisner
We present a machine foreign-language teacher that modifies text in a student{'}s native language (L1) by replacing some word tokens with glosses in a foreign language (L2), in such a way that the student can acquire L2 vocabulary simply by reading the resulting macaronic text.
no code implementations • IJCNLP 2019 • Brian Thompson, Rebecca Knowles, Xuan Zhang, Huda Khayrallah, Kevin Duh, Philipp Koehn
Bilingual lexicons are valuable resources used by professional human translators.
no code implementations • IJCNLP 2019 • Brian Thompson, Philipp Koehn
It substantially outperforms the popular Hunalign toolkit at recovering Bible verse alignments in medium- to low-resource language pairs, and it improves downstream MT quality by 1. 7 and 1. 6 BLEU in Sinhala-English and Nepali-English, respectively, compared to the Hunalign-based Paracrawl pipeline.
1 code implementation • IJCNLP 2019 • Francisco Guzm{\'a}n, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc{'}Aurelio Ranzato
For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available.
no code implementations • WS 2019 • Lo{\"\i}c Barrault, Ond{\v{r}}ej Bojar, Marta R. Costa-juss{\`a}, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias M{\"u}ller, Santanu Pal, Matt Post, Marcos Zampieri
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
no code implementations • WS 2019 • Philipp Koehn, Francisco Guzm{\'a}n, Vishrav Chaudhary, Juan Pino
Following the WMT 2018 Shared Task on Parallel Corpus Filtering, we posed the challenge of assigning sentence-level quality scores for very noisy corpora of sentence pairs crawled from the web, with the goal of sub-selecting 2{\%} and 10{\%} of the highest-quality data to be used to train machine translation systems.
no code implementations • WS 2019 • Adithya Renduchintala, Philipp Koehn, Jason Eisner
We accomplish this by modifying a cloze language model to incrementally learn new vocabulary items, and use this language model as a proxy for the word guessing and learning ability of real students.
no code implementations • WS 2019 • Kelly Marchisio, Yash Kumar Lal, Philipp Koehn
We describe the work of Johns Hopkins University for the shared task of news translation organized by the Fourth Conference on Machine Translation (2019).
no code implementations • ACL 2019 • Yash Kumar Lal, Vaibhav Kumar, Mrinal Dhar, Manish Shrivastava, Philipp Koehn
The Collective Encoder captures the overall sentiment of the sentence, while the Specific Encoder utilizes an attention mechanism in order to focus on individual sentiment-bearing sub-words.
1 code implementation • WS 2019 • Xi-An Li, Paul Michel, Antonios Anastasopoulos, Yonatan Belinkov, Nadir Durrani, Orhan Firat, Philipp Koehn, Graham Neubig, Juan Pino, Hassan Sajjad
We share the findings of the first shared task on improving robustness of Machine Translation (MT).
1 code implementation • WS 2019 • Shuoyang Ding, Hainan Xu, Philipp Koehn
Despite their original goal to jointly learn to align and translate, Neural Machine Translation (NMT) models, especially Transformer, are often perceived as not learning interpretable word alignments.
no code implementations • 24 Jun 2019 • Yvette Graham, Barry Haddow, Philipp Koehn
Finally, we provide a comprehensive check-list for future machine translation evaluation.
no code implementations • WS 2019 • Vishrav Chaudhary, Yuqing Tang, Francisco Guzmán, Holger Schwenk, Philipp Koehn
In this paper, we describe our submission to the WMT19 low-resource parallel corpus filtering shared task.
no code implementations • NAACL 2019 • Brian Thompson, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, Philipp Koehn
Continued training is an effective method for domain adaptation in neural machine translation.
1 code implementation • WS 2019 • Shuoyang Ding, Philipp Koehn
Stack Long Short-Term Memory (StackLSTM) is useful for various applications such as parsing and string-to-tree neural machine translation, but it is also known to be notoriously difficult to parallelize for GPU training due to the fact that the computations are dependent on discrete operations.
2 code implementations • 4 Feb 2019 • Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato
For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available.
no code implementations • EMNLP 2018 • Rebecca Knowles, Philipp Koehn
In this work, we show that they learn to copy words based on both the context in which the words appear as well as features of the words themselves.
no code implementations • WS 2018 • Ond{\v{r}}ej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, Christof Monz
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2018.
no code implementations • WS 2018 • Huda Khayrallah, Hainan Xu, Philipp Koehn
This work describes our submission to the WMT18 Parallel Corpus Filtering shared task.
no code implementations • WS 2018 • Philipp Koehn, Kevin Duh, Brian Thompson
We report on the efforts of the Johns Hopkins University to develop neural machine translation systems for the shared task for news translation organized around the Conference for Machine Translation (WMT) 2018.
no code implementations • WS 2018 • Philipp Koehn, Huda Khayrallah, Kenneth Heafield, Mikel L. Forcada
We posed the shared task of assigning sentence-level quality scores for a very noisy corpus of sentence pairs crawled from the web, with the goal of sub-selecting 1{\%} and 10{\%} of high-quality data to be used to train machine translation systems.
no code implementations • EMNLP 2018 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
1 code implementation • WS 2018 • Brian Thompson, Huda Khayrallah, Antonios Anastasopoulos, Arya D. McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, Philipp Koehn
To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation.
no code implementations • WS 2019 • Adithya Renduchintala, Pamela Shapiro, Kevin Duh, Philipp Koehn
Neural machine translation (NMT) systems operate primarily on words (or sub-words), ignoring lower-level patterns of morphology.
no code implementations • WS 2018 • Sachith Sri Ram Kothur, Rebecca Knowles, Philipp Koehn
It is common practice to adapt machine translation systems to novel domains, but even a well-adapted system may be able to perform better on a particular document if it were to learn from a translator{'}s corrections within the document itself.
no code implementations • WS 2018 • Vu Cong Duy Hoang, Philipp Koehn, Gholamreza Haffari, Trevor Cohn
We present iterative back-translation, a method for generating increasingly better synthetic parallel data from monolingual data to train neural machine translation systems.
1 code implementation • WS 2018 • Huda Khayrallah, Brian Thompson, Kevin Duh, Philipp Koehn
Supervised domain adaptation{---}where a large generic corpus and a smaller in-domain corpus are both available for training{---}is a challenge for neural machine translation (NMT).
1 code implementation • WS 2018 • Huda Khayrallah, Philipp Koehn
We examine how various types of noise in the parallel training data impact the quality of neural machine translation systems.
no code implementations • IJCNLP 2017 • Huda Khayrallah, Gaurav Kumar, Kevin Duh, Matt Post, Philipp Koehn
Domain adaptation is a major challenge for neural machine translation (NMT).
no code implementations • IJCNLP 2017 • Benjamin Van Durme, Tom Lippincott, Kevin Duh, Deana Burchfield, Adam Poliak, Cash Costello, Tim Finin, Scott Miller, James Mayfield, Philipp Koehn, Craig Harman, Dawn Lawrie, Ch May, ler, Max Thomas, Annabelle Carrell, Julianne Chaloux, Tongfei Chen, Alex Comerford, Mark Dredze, Benjamin Glass, Shudong Hao, Patrick Martin, Pushpendre Rastogi, Rashmi Sankepally, Travis Wolfe, Ying-Ying Tran, Ted Zhang
It combines a multitude of analytics together with a flexible environment for customizing the workflow for different users.
5 code implementations • 22 Sep 2017 • Philipp Koehn
Draft of textbook chapter on neural machine translation.
Ranked #1 on
Machine Translation
on 20NEWS
(using extra training data)
no code implementations • WS 2017 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shu-Jian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, Marco Turchi
no code implementations • EMNLP 2017 • Hainan Xu, Philipp Koehn
We introduce Zipporah, a fast and scalable data cleaning system.
no code implementations • CONLL 2017 • Adithya Renduchintala, Philipp Koehn, Jason Eisner
We present a feature-rich knowledge tracing method that captures a student{'}s acquisition and retention of knowledge during a foreign language phrase learning task.
no code implementations • WS 2017 • Philipp Koehn, Rebecca Knowles
We explore six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and beam search.
no code implementations • WS 2017 • Maria Nadejde, Siva Reddy, Rico Sennrich, Tomasz Dwojak, Marcin Junczys-Dowmunt, Philipp Koehn, Alexandra Birch
Our results on WMT data show that explicitly modeling target-syntax improves machine translation quality for German->English, a high-resource pair, and for Romanian->English, a low-resource pair and also several syntactic phenomena including prepositional phrase attachment.
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Christian Buck, Rajen Chatterjee, Christian Federmann, Liane Guillou, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Pavel Pecina, Martin Popel, Philipp Koehn, Christof Monz, Matteo Negri, Matt Post, Lucia Specia, Karin Verspoor, J{\"o}rg Tiedemann, Marco Turchi
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri
no code implementations • WS 2015 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia, Marco Turchi
no code implementations • COLING 2014 • Marcello Federico, Nicola Bertoldi, Mauro Cettolo, Matteo Negri, Marco Turchi, Marco Trombetti, Aless Cattelan, ro, Antonio Farina, Domenico Lupinetti, Andrea Martines, Alberto Massidda, Holger Schwenk, Lo{\"\i}c Barrault, Frederic Blain, Philipp Koehn, Christian Buck, Ulrich Germann
no code implementations • WS 2014 • Ondrej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia, Aleš Tamchyna
no code implementations • EACL 2014 • Vicent Alabau, Christian Buck, Michael Carl, Francisco Casacuberta, Mercedes García-Martínez, Ulrich Germann, Jesús González-Rubio, Robin Hill, Philipp Koehn, Luis Leiva, Bartolomé Mesa-Lao, Daniel Ortiz-Martínez, Herve Saint-Amand, Germán Sanchis Trilles, Chara Tsoukala
no code implementations • MTSummit 2005 • Philipp Koehn
We collected a corpus of parallel text in 11 languages from the proceedings of the European Parliament, which are published on the web.
no code implementations • HLT-NAACL 2003 • Philipp Koehn, Franz J. Och, Daniel Marcu
We propose a new phrase-based translation model and decoding algorithm that enables us to evaluate and compare several, previously proposed phrase-based translation models.