Learning Morphosyntactic Analyzers from the Bible via Iterative Annotation Projection across 26 Languages

ACL 2019 · Garrett Nicolai, David Yarowsky ·

A large percentage of computational tools are concentrated in a very small subset of the planet{'}s languages. Compounding the issue, many languages lack the high-quality linguistic annotation necessary for the construction of such tools with current machine learning methods. In this paper, we address both issues simultaneously: leveraging the high accuracy of English taggers and parsers, we project morphological information onto translations of the Bible in 26 varied test languages. Using an iterative discovery, constraint, and training process, we build inflectional lexica in the target languages. Through a combination of iteration, ensembling, and reranking, we see double-digit relative error reductions in lemmatization and morphological analysis over a strong initial system.

PDF Abstract