About

Lexical normalization is the task of translating/transforming a non standard text to a standard register.

Example:

new pix comming tomoroe
new pictures coming tomorrow

Datasets usually consists of tweets, since these naturally contain a fair amount of these phenomena.

For lexical normalization, only replacements on the word-level are annotated. Some corpora include annotation for 1-N and N-1 replacements. However, word insertion/deletion and reordering is not part of the task.

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Datasets

Greatest papers with code

Adapting Sequence to Sequence models for Text Normalization in Social Media

12 Apr 2019Isminoula/TextNormSeq2Seq

Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks.

LEXICAL NORMALIZATION

Adapting Deep Learning for Sentiment Classification of Code-Switched Informal Short Text

4 Jan 2020haroonshakeel/multisenti

Such informal and code-switched content are under-resourced in terms of labeled datasets and language models even for popular tasks like sentiment classification.

LEXICAL NORMALIZATION SENTIMENT ANALYSIS

A Clustering Framework for Lexical Normalization of Roman Urdu

31 Mar 2020abdulrafae/normalization

Roman Urdu is an informal form of the Urdu language written in Roman script, which is widely used in South Asia for online textual content.

LEXICAL NORMALIZATION

A Multi-cascaded Deep Model for Bilingual SMS Classification

29 Nov 2019haroonshakeel/bilingual_sms_classification

Our model achieves high accuracy for classification on this dataset and outperforms the previous model for multilingual text classification, highlighting language independence of McM.

LEXICAL NORMALIZATION MULTILINGUAL TEXT CLASSIFICATION TEXT CLASSIFICATION TRANSLITERATION

MoNoise: A Multi-lingual and Easy-to-use Lexical Normalization Tool

ACL 2019 robvanderg/cacheembeds

In this paper, we introduce and demonstrate the online demo as well as the command line interface of a lexical normalization system (MoNoise) for a variety of languages.

LEXICAL NORMALIZATION

Modeling Input Uncertainty in Neural Network Dependency Parsing

EMNLP 2018 robvanderg/normpar

Recently introduced neural network parsers allow for new approaches to circumvent data sparsity issues by modeling character level information and by exploiting raw data in a semi-supervised setting.

DEPENDENCY PARSING LEXICAL NORMALIZATION WORD EMBEDDINGS

MoNoise: Modeling Noise Using a Modular Normalization System

10 Oct 2017wesselreijngoud/masterthesis2019

We show that MoNoise beats the state-of-the-art on different normalization benchmarks for English and Dutch, which all define the task of normalization slightly different.

LEXICAL NORMALIZATION SPELLING CORRECTION WORD EMBEDDINGS