HMMs for Unsupervised Vietnamese WordSegmentation
Word segmentation is an important problem in nat-ural language processing. Most of previous works on Vietnameseword segmentation are supervised learning. In this paper, wepropose an unsupervised method for Vietnamese word segmenta-tion based on Hidden Markov Models. We naturally encode priorlinguistic knowledge into model learning. In decoding, we proposean enhancement of Viterbi decoding algorithm with externaltoken ordering statistics from Pointwise Mutual Information.Evaluation on benchmark datasets shows that the proposedmethod works reasonably well. Sourcecode is available at https://github.com/longbb/wordrecognition
PDF Abstract