Multifaceted Assessments of Traditional Chinese Word Segmentation Tool on Large Corpora

ROCLING 2022 · Wen-Chao Yeh, Yu-Lun Hsieh, Yung-Chun Chang, Wen-Lian Hsu ·

This study aims to evaluate three most popular word segmentation tool for a large Traditional Chinese corpus in terms of their efficiency, resource consumption, and cost. Specifically, we compare the performances of Jieba, CKIP, and MONPA on word segmentation, part-of-speech tagging and named entity recognition through extensive experiments. Experimental results show that MONPA using GPU for batch segmentation can greatly reduce the processing time of massive datasets. In addition, its features such as word segmentation, part-of-speech tagging, and named entity recognition are beneficial to downstream applications.

PDF Abstract