Domain Adaptation of Thai Word Segmentation Models using Stacked Ensemble

Like many Natural Language Processing tasks, Thai word segmentation is domain-dependent. Researchers have been relying on transfer learning to adapt an existing model to a new domain. However, this approach is inapplicable to cases where we can interact with only input and output layers of the models, also known as {``}black boxes{''}. We propose a filter-and-refine solution based on the stacked-ensemble learning paradigm to address this black-box limitation. We conducted extensive experimental studies comparing our method against state-of-the-art models and transfer learning. Experimental results show that our proposed solution is an effective domain adaptation method and has a similar performance as the transfer learning method.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


 Ranked #1 on Thai Word Segmentation on WS160 (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Thai Word Segmentation BEST-2010 Stacked Ensemble (CRF) F1-Score 0.9812 # 5
Thai Word Segmentation WS160 Stacked Ensemble (CRF) F1-score 0.952 # 1

Methods


No methods listed for this paper. Add relevant methods here