UniDic for Early Middle Japanese: a Dictionary for Morphological Analysis of Classical Japanese

In order to construct an annotated diachronic corpus of Japanese, we propose to create a new dictionary for morphological analysis of Early Middle Japanese (Classical Japanese) based on UniDic, a dictionary for Contemporary Japanese. Differences between the Early Middle Japanese and Contemporary Japanese, which prevent a na{\"\i}ve adaptation of UniDic to Early Middle Japanese, are found at the levels of lexicon, morphology, grammar, orthography and pronunciation. In order to overcome these problems, we extended dictionary entries and created a training corpus of Early Middle Japanese to adapt UniDic for Contemporary Japanese to Early Middle Japanese. Experimental results show that the proposed UniDic-EMJ, a new dictionary for Early Middle Japanese, achieves as high accuracy (97{\%}) as needed for the linguistic research on lexicon and grammar in Japanese classical text analysis.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here