Lexical Normalization

15 papers with code • 1 benchmarks • 1 datasets

Lexical normalization is the task of translating/transforming a non standard text to a standard register.

Example:

new pix comming tomoroe
new pictures coming tomorrow

Datasets usually consists of tweets, since these naturally contain a fair amount of these phenomena.

For lexical normalization, only replacements on the word-level are annotated. Some corpora include annotation for 1-N and N-1 replacements. However, word insertion/deletion and reordering is not part of the task.

Benchmarks

Add a Result

These leaderboards are used to track progress in Lexical Normalization

Trend	Dataset	Best Model	Paper	Code	Compare
	LexNorm	MoNoise			See all

Datasets

MultiSenti

Subtasks

Pronunciation Dictionary Creation

Latest papers

Most implemented Social Latest No code

ViLexNorm: A Lexical Normalization Corpus for Vietnamese Social Media Text

ngxtnhi/vilexnorm • 29 Jan 2024

In this work, we introduce Vietnamese Lexical Normalization (ViLexNorm), the first-ever corpus developed for the Vietnamese lexical normalization task.

29 Jan 2024

Paper
Code

Automatic Textual Normalization for Hate Speech Detection

anhhoang0529/small-lexnormvihsd • 12 Nov 2023

Our dataset is accessible for research purposes.

12 Nov 2023

Paper
Code

ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5

ufal/multilexnorm2021 • • WNUT (ACL) 2021

We present the winning entry to the Multilingual Lexical Normalization (MultiLexNorm) shared task at W-NUT 2021 (van der Goot et al., 2021a), which evaluates lexical-normalization systems on 12 social media datasets in 11 languages.

28 Oct 2021

Paper
Code

DaN+: Danish Nested Named Entities and Lexical Normalization

bplank/DaNplus • • COLING 2020

We examine language-specific versus multilingual BERT, and study the effect of lexical normalization on NER.

24 May 2021

Paper
Code

User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization

shigashiyama/jlexnorm • NAACL 2021

Morphological analysis (MA) and lexical normalization (LN) are both important tasks for Japanese user-generated text (UGT).

08 Apr 2021

Paper
Code

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

ozlemcek/TrDeNormData • EACL 2021

Lexical normalization, the translation of non-canonical data to standard language, has shown to improve the performance of many natural language processing tasks on social media.

01 Apr 2021

Paper
Code

A Clustering Framework for Lexical Normalization of Roman Urdu

abdulrafae/normalization • 31 Mar 2020

Roman Urdu is an informal form of the Urdu language written in Roman script, which is widely used in South Asia for online textual content.

31 Mar 2020

Paper
Code

Adapting Deep Learning for Sentiment Classification of Code-Switched Informal Short Text

haroonshakeel/multisenti • 4 Jan 2020

Such informal and code-switched content are under-resourced in terms of labeled datasets and language models even for popular tasks like sentiment classification.

04 Jan 2020

Paper
Code

A Multi-cascaded Deep Model for Bilingual SMS Classification

haroonshakeel/bilingual_sms_classification • 29 Nov 2019

Our model achieves high accuracy for classification on this dataset and outperforms the previous model for multilingual text classification, highlighting language independence of McM.

29 Nov 2019

Paper
Code

MoNoise: A Multi-lingual and Easy-to-use Lexical Normalization Tool

robvanderg/cacheembeds • ACL 2019

In this paper, we introduce and demonstrate the online demo as well as the command line interface of a lexical normalization system (MoNoise) for a variety of languages.

01 Jul 2019

Paper
Code

Lexical Normalization

Benchmarks Add a Result

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result