Search Results for author: C. M. Downey

Found 5 papers, 5 papers with code

Embedding structure matters: Comparing methods to adapt multilingual vocabularies to new languages

1 code implementation9 Sep 2023 C. M. Downey, Terra Blevins, Nora Goldfine, Shane Steinert-Threlkeld

Pre-trained multilingual language models underpin a large portion of modern NLP tools outside of English.

Learning to translate by learning to communicate

1 code implementation14 Jul 2022 C. M. Downey, Xuhui Zhou, Leo Z. Liu, Shane Steinert-Threlkeld

We formulate and test a technique to use Emergent Communication (EC) with a pre-trained multilingual model to improve on modern Unsupervised NMT systems, especially for low-resource languages.

Natural Language Understanding NMT

Multilingual unsupervised sequence segmentation transfers to extremely low-resource languages

1 code implementation ACL 2022 C. M. Downey, Shannon Drizin, Levon Haroutunian, Shivin Thukral

Further, we show that this transfer can be achieved by training over a collection of low-resource languages that are typologically similar (but phylogenetically unrelated) to the target language.

Language Modelling Segmentation

A Masked Segmental Language Model for Unsupervised Natural Language Segmentation

1 code implementation NAACL (SIGMORPHON) 2022 C. M. Downey, Fei Xia, Gina-Anne Levow, Shane Steinert-Threlkeld

Segmentation remains an important preprocessing step both in languages where "words" or other important syntactic/semantic units (like morphemes) are not clearly delineated by white space, as well as when dealing with continuous speech data, where there is often no meaningful pause between words.

Language Modelling Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.