MoNoise: A Multi-lingual and Easy-to-use Lexical Normalization Tool

ACL 2019  ·  Rob van der Goot ·

In this paper, we introduce and demonstrate the online demo as well as the command line interface of a lexical normalization system (MoNoise) for a variety of languages. We further improve this model by using features from the original word for every normalization candidate. For comparison with future work, we propose the bundling of seven datasets in six languages to form a new benchmark, together with a novel evaluation metric which is particularly suitable for cross-dataset comparisons. MoNoise reaches a new state-of-art performance for six out of seven of these datasets. Furthermore, we allow the user to tune the {`}aggressiveness{'} of the normalization, and show how the model can be made more efficient with only a small loss in performance. The online demo can be found on: http://www.robvandergoot.com/monoise and the corresponding code on: https://bitbucket.org/robvanderg/monoise/

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here