Automatically Tailoring Unsupervised Morphological Segmentation to the Language
Morphological segmentation is beneficial for several natural language processing tasks dealing with large vocabularies. Unsupervised methods for morphological segmentation are essential for handling a diverse set of languages, including low-resource languages. Eskander et al. (2016) introduced a Language Independent Morphological Segmenter (LIMS) using Adaptor Grammars (AG) based on the best-on-average performing AG configuration. However, while LIMS worked best on average and outperforms other state-of-the-art unsupervised morphological segmentation approaches, it did not provide the optimal AG configuration for five out of the six languages. We propose two language-independent classifiers that enable the selection of the optimal or nearly-optimal configuration for the morphological segmentation of unseen languages.
PDF Abstract