Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin

WS 2017 · G{\'e}raldine Walther, Beno{\^\i}t Sagot ·

In this paper, we present ongoing work for developing language resources and basic NLP tools for an undocumented variety of Romansh, in the context of a language documentation and language acquisition project. Our tools are meant to improve the speed and reliability of corpus annotations for noisy data involving large amounts of code-switching, occurrences of child-speech and orthographic noise. Being able to increase the efficiency of language resource development for language documentation and acquisition research also constitutes a step towards solving the data sparsity issues with which researchers have been struggling.

PDF Abstract