TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Thai Word Segmentation	BEST-2010	Multiple Attentions (char-word-cc)	F1-Score	0.9899	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/character-based-thai-word-segmentation-with/thai-word-tokenization-on-best-2010)](https://paperswithcode.com/sota/thai-word-tokenization-on-best-2010?p=character-based-thai-word-segmentation-with)`

Character-based Thai Word Segmentation with Multiple Attentions

RANLP 2021 · Thodsaporn Chay-intr, Hidetaka Kamigaito, Manabu Okumura ·

Character-based word-segmentation models have been extensively applied to agglutinative languages, including Thai, due to their high performance. These models estimate word boundaries from a character sequence. However, a character unit in sequences has no essential meaning, compared with word, subword, and character cluster units. We propose a Thai word-segmentation model that uses various types of information, including words, subwords, and character clusters, from a character sequence. Our model applies multiple attentions to refine segmentation inferences by estimating the significant relationships among characters and various unit types. The experimental results indicate that our model can outperform other state-of-the-art Thai word-segmentation models.