We propose ThaiLMCut, a semi-supervised approach for Thai word segmentation which utilizes a bi-directional character language model (LM) as a way to leverage useful linguistic knowledge from unlabeled data. After the language model is trained on substantial unlabeled corpora, the weights of its embedding and recurrent layers are transferred to a supervised word segmentation model which continues fine-tuning them on a word segmentation task... (read more)
PDFTASK | DATASET | MODEL | METRIC NAME | METRIC VALUE | GLOBAL RANK | BENCHMARK |
---|---|---|---|---|---|---|
Thai Word Segmentation | BEST-2010 | ThaiLMCut | F1-Score | 0.9878 | # 2 |
METHOD | TYPE | |
---|---|---|
🤖 No Methods Found | Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet |