Boundary-based MWE segmentation with text partitioning

WS 2017  ·  Jake Williams ·

This submission describes the development of a fine-grained, text-chunking algorithm for the task of comprehensive MWE segmentation. This task notably focuses on the identification of colloquial and idiomatic language. The submission also includes a thorough model evaluation in the context of two recent shared tasks, spanning 19 different languages and many text domains, including noisy, user-generated text. Evaluations exhibit the presented model as the best overall for purposes of MWE segmentation, and open-source software is released with the submission (although links are withheld for purposes of anonymity). Additionally, the authors acknowledge the existence of a pre-print document on arxiv.org, which should be avoided to maintain anonymity in review.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here