1 code implementation • 9 Sep 2023 • C. M. Downey, Terra Blevins, Nora Goldfine, Shane Steinert-Threlkeld
Pre-trained multilingual language models underpin a large portion of modern NLP tools outside of English.
1 code implementation • 16 Dec 2022 • C. M. Downey, Wei Dai, Huseyin A. Inan, Kim Laine, Saurabh Naik, Tomasz Religa
Language models are widely deployed to provide automatic text completion services in user products.
1 code implementation • 14 Jul 2022 • C. M. Downey, Xuhui Zhou, Leo Z. Liu, Shane Steinert-Threlkeld
We formulate and test a technique to use Emergent Communication (EC) with a pre-trained multilingual model to improve on modern Unsupervised NMT systems, especially for low-resource languages.
1 code implementation • ACL 2022 • C. M. Downey, Shannon Drizin, Levon Haroutunian, Shivin Thukral
Further, we show that this transfer can be achieved by training over a collection of low-resource languages that are typologically similar (but phylogenetically unrelated) to the target language.
1 code implementation • NAACL (SIGMORPHON) 2022 • C. M. Downey, Fei Xia, Gina-Anne Levow, Shane Steinert-Threlkeld
Segmentation remains an important preprocessing step both in languages where "words" or other important syntactic/semantic units (like morphemes) are not clearly delineated by white space, as well as when dealing with continuous speech data, where there is often no meaningful pause between words.