Morphological Analysis Corpus Construction of Uyghur

“Morphological analysis is a fundamental task in natural language processing and results can beapplied to different downstream tasks such as named entity recognition syntactic analysis andmachine translation. However there are many problems in morphological analysis such as lowaccuracy caused by a lack of resources. In this paper to alleviate the lack of resources in Uyghurmorphological analysis research we construct a Uyghur morphological analysis corpus based onthe analysis of grammatical features and the format of the general morphological analysis corpus.We define morphological tags from 14 dimensions and 53 features manually annotate and correctthe dataset. Finally the corpus provided some informations such as word lemma part of speech morphological analysis tags morphological segmentation and lemmatization. Also this paperanalyzes some basic features of the corpus and we use the models and datasets provided bySIGMORPHON Shared Task organizers to design comparative experiments to verify the corpus’savailability. Results of the experiment are 85.56% 88.29% respectively. The corpus provides areference value for morphological analysis and promotes the research of Uyghur natural language processing.”

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here