Lexical Normalization
15 papers with code • 1 benchmarks • 1 datasets
Lexical normalization is the task of translating/transforming a non standard text to a standard register.
Example:
new pix comming tomoroe
new pictures coming tomorrow
Datasets usually consists of tweets, since these naturally contain a fair amount of these phenomena.
For lexical normalization, only replacements on the word-level are annotated. Some corpora include annotation for 1-N and N-1 replacements. However, word insertion/deletion and reordering is not part of the task.
Most implemented papers
ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5
We present the winning entry to the Multilingual Lexical Normalization (MultiLexNorm) shared task at W-NUT 2021 (van der Goot et al., 2021a), which evaluates lexical-normalization systems on 12 social media datasets in 11 languages.
Automatic Textual Normalization for Hate Speech Detection
Our dataset is accessible for research purposes.
ViLexNorm: A Lexical Normalization Corpus for Vietnamese Social Media Text
In this work, we introduce Vietnamese Lexical Normalization (ViLexNorm), the first-ever corpus developed for the Vietnamese lexical normalization task.