1 code implementation • Findings (ACL) 2022 • Xinjian Li, Florian Metze, David Mortensen, Shinji Watanabe, Alan Black
Grapheme-to-Phoneme (G2P) has many applications in NLP and speech fields.
1 code implementation • 10 Feb 2025 • Kwanghee Choi, Eunjung Yeo, Kalvin Chang, Shinji Watanabe, David Mortensen
Allophony refers to the variation in the phonetic realization of a phoneme based on its phonetic environment.
no code implementations • 27 Jan 2025 • Eunjung Yeo, Julie Liss, Visar Berisha, David Mortensen
Purpose: This commentary introduces how artificial intelligence (AI) can be leveraged to advance cross-language intelligibility assessment of dysarthric speech.
no code implementations • 12 Nov 2024 • Valentin Hofmann, Leonie Weissweiler, David Mortensen, Hinrich Schütze, Janet Pierrehumbert
As expected, rule-based and analogical models explain the predictions of GPT-J equally well for adjectives with regular nominalization patterns.
no code implementations • 4 Jul 2024 • Brendon Boldt, David Mortensen
The availability of a substantial collection of well-documented emergent language corpora, then, will enable research which can analyze a wider variety of emergent languages, which more effectively uncovers general principles in emergent communication rather than artifacts of particular environments.
no code implementations • 3 Jul 2024 • Brendon Boldt, David Mortensen
In this paper, we introduce a benchmark for evaluating the overall quality of emergent languages using data-driven methods.
no code implementations • 3 Jul 2024 • Brendon Boldt, David Mortensen
Emergent communication, or emergent language, is the field of research which studies how human language-like communication systems emerge de novo in deep multi-agent reinforcement learning environments.
no code implementations • 18 Jun 2024 • Atharva Naik, Kexun Zhang, Nathaniel Robinson, Aravind Mysore, Clayton Marr, Hong Sng Rebecca Byrnes, Anna Cai, Kalvin Chang, David Mortensen
Historical linguists have long written a kind of incompletely formalized ''program'' that converts reconstructed words in an ancestor language into words in one of its attested descendants that consist of a series of ordered string rewrite functions (called sound laws).
1 code implementation • 4 Jul 2023 • Young Min Kim, Kalvin Chang, Chenxuan Cui, David Mortensen
We update their model with the state-of-the-art seq2seq model: the Transformer.
1 code implementation • 5 Apr 2023 • Vilém Zouhar, Kalvin Chang, Chenxuan Cui, Nathaniel Carlson, Nathaniel Robinson, Mrinmaya Sachan, David Mortensen
Mapping words into a fixed-dimensional vector space is the backbone of modern NLP.
1 code implementation • 28 Nov 2022 • Brendon Boldt, David Mortensen
We formulate a stochastic process, FiLex, as a mathematical model of lexicon entropy in deep learning-based emergent language systems.
no code implementations • 22 Jun 2022 • Brendon Boldt, David Mortensen
Emergent language is unique among fields within the discipline of machine learning for its open-endedness, not obviously presenting well-defined problems to be solved.
1 code implementation • 22 Jun 2022 • Brendon Boldt, David Mortensen
We introduce FiLex, a self-reinforcing stochastic process which models finite lexicons in emergent language experiments.
1 code implementation • 12 Oct 2021 • David Francis, Ella Rabinovich, Farhan Samir, David Mortensen, Suzanne Stevenson
Specifically, we propose a variety of psycholinguistic factors -- semantic, distributional, and phonological -- that we hypothesize are predictive of lexical decline, in which words greatly decrease in frequency over time.
no code implementations • 4 Apr 2021 • Kathleen Siminyu, Xinjian Li, Antonios Anastasopoulos, David Mortensen, Michael R. Marlo, Graham Neubig
Models pre-trained on multiple languages have shown significant promise for improving speech recognition, particularly for low-resource languages.
no code implementations • NAACL 2016 • Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, Patrick Littell, David Mortensen, Alan W. black, Lori Levin, Chris Dyer
We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted.