Lexical Normalization

15 papers with code • 1 benchmarks • 1 datasets

Lexical normalization is the task of translating/transforming a non standard text to a standard register.

Example:

new pix comming tomoroe
new pictures coming tomorrow

Datasets usually consists of tweets, since these naturally contain a fair amount of these phenomena.

For lexical normalization, only replacements on the word-level are annotated. Some corpora include annotation for 1-N and N-1 replacements. However, word insertion/deletion and reordering is not part of the task.

Benchmarks

Add a Result

These leaderboards are used to track progress in Lexical Normalization

Trend	Dataset	Best Model	Paper	Code	Compare
	LexNorm	MoNoise			See all

Datasets

MultiSenti

Subtasks

Pronunciation Dictionary Creation

Most implemented papers

Most implemented Social Latest No code

ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5

ufal/multilexnorm2021 • • WNUT (ACL) 2021

We present the winning entry to the Multilingual Lexical Normalization (MultiLexNorm) shared task at W-NUT 2021 (van der Goot et al., 2021a), which evaluates lexical-normalization systems on 12 social media datasets in 11 languages.

Paper
Code

Automatic Textual Normalization for Hate Speech Detection

anhhoang0529/small-lexnormvihsd • 12 Nov 2023

Our dataset is accessible for research purposes.

Paper
Code

ViLexNorm: A Lexical Normalization Corpus for Vietnamese Social Media Text

ngxtnhi/vilexnorm • 29 Jan 2024

In this work, we introduce Vietnamese Lexical Normalization (ViLexNorm), the first-ever corpus developed for the Vietnamese lexical normalization task.

Paper
Code

Lexical Normalization

Benchmarks Add a Result

Datasets

Subtasks

Most implemented papers

ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5

Automatic Textual Normalization for Hate Speech Detection

ViLexNorm: A Lexical Normalization Corpus for Vietnamese Social Media Text

Content

Benchmarks

Add a Result