Morphological Disambiguation of South Sámi with FSTs and Neural Networks

29 Apr 2020  ·  Mika Hämäläinen, Linda Wiechetek ·

We present a method for conducting morphological disambiguation for South S\'ami, which is an endangered language. Our method uses an FST-based morphological analyzer to produce an ambiguous set of morphological readings for each word in a sentence. These readings are disambiguated with a Bi-RNN model trained on the related North S\'ami UD Treebank and some synthetically generated South S\'ami data. The disambiguation is done on the level of morphological tags ignoring word forms and lemmas; this makes it possible to use North S\'ami training data for South S\'ami without the need for a bilingual dictionary or aligned word embeddings. Our approach requires only minimal resources for South S\'ami, which makes it usable and applicable in the contexts of any other endangered language as well.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here