Investigating variation in written forms of Nahuatl using character-based language models

NAACL (AmericasNLP) 2021 · Robert Pugh, Francis Tyers ·

We describe experiments with character-based language modeling for written variants of Nahuatl. Using a standard LSTM model and publicly available Bible translations, we explore how character language models can be applied to the tasks of estimating mutual intelligibility, identifying genetic similarity, and distinguishing written variants. We demonstrate that these simple language models are able to capture similarities and differences that have been described in the linguistic literature.

PDF Abstract