1 code implementation • MTSummit 2021 • Kelly Marchisio, Philipp Koehn, Conghao Xiong
Aimed at generating a seed lexicon for use in downstream natural language tasks and unsupervised methods for bilingual lexicon induction have received much attention in the academic literature recently.
no code implementations • AMTA 2022 • Kelly Marchisio, Conghao Xiong, Philipp Koehn
A popular natural language processing task decades ago, word alignment has been dominated until recently by GIZA++, a statistical method based on the 30-year-old IBM models.
no code implementations • NeurIPS 2023 • Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe
Pretrained language models (PLMs) are today the primary model for natural language processing.
no code implementations • 17 Jan 2023 • Henry Li Xinyuan, Ray Lee, Jerry Chen, Kelly Marchisio
On the other hand, downstream tasks such as translation would benefit from working with a sentence representation that preserves formality in addition to semantics, so as to generate sentences with the appropriate level of social formality -- the difference between speaking to a friend versus speaking with a supervisor.
no code implementations • 20 Dec 2022 • Kelly Marchisio, Patrick Lewis, Yihong Chen, Mikel Artetxe
Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings, while keeping the transformer body frozen.
no code implementations • 25 Oct 2022 • Kelly Marchisio, Ali Saad-Eldin, Kevin Duh, Carey Priebe, Philipp Koehn
Bilingual lexicons form a critical component of various natural language processing applications, including unsupervised and semisupervised machine translation and crosslingual information retrieval.
1 code implementation • 11 Oct 2022 • Kelly Marchisio, Neha Verma, Kevin Duh, Philipp Koehn
The ability to extract high-quality translation dictionaries from monolingual word embedding spaces depends critically on the geometric similarity of the spaces -- their degree of "isomorphism."
1 code implementation • Findings (EMNLP) 2021 • Kelly Marchisio, Youngser Park, Ali Saad-Eldin, Anton Alyakin, Kevin Duh, Carey Priebe, Philipp Koehn
Alternatively, word embeddings may be understood as nodes in a weighted graph.
no code implementations • NAACL 2022 • Kelly Marchisio, Markus Freitag, David Grangier
Modern unsupervised machine translation (MT) systems reach reasonable translation quality under clean and controlled data conditions.
1 code implementation • 18 Apr 2021 • Kelly Marchisio, Conghao Xiong, Philipp Koehn
In the lowest-resource setting, we outperform GIZA++ by 8. 5, 10. 9, and 12 AER for Ro-En, De-En, and En-Fr, respectively.
no code implementations • WMT (EMNLP) 2020 • Kelly Marchisio, Kevin Duh, Philipp Koehn
We additionally find that unsupervised MT performance declines when source and target languages use different scripts, and observe very poor performance on authentic low-resource language pairs.
no code implementations • WS 2019 • Kelly Marchisio, Yash Kumar Lal, Philipp Koehn
We describe the work of Johns Hopkins University for the shared task of news translation organized by the Fourth Conference on Machine Translation (2019).